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Abstract 


This document describes an active object recognition system which is based upon uncer- 
tain information processing. The context in which this work is to be placed is defined by 
the ultimate goal of developing a general framework for information fusion and information 
source selection in image understanding. An intelligent system combines knowledge about 
visual sources, several levels of abstraction in representation and matching, and classes of 
image understanding processes to fuse visual information obtained from actively selected 
information sources. The system presented in this thesis constitutes a partial solution of 
this research programme in the domain of active object recognition. 


The developed system actively repositions the camera to capture additional views and 
resolves ambiguities existing in classification results obtained from a single view. The ap- 
proach uses an appearance based object representation, namely the parametric eigenspace, 
but the action planning steps are largely independent of any details of the specific object 
recognition environment. 


In order to select the most promising next action the system evaluates the expected 
utility of each action at its disposal. To estimate the utility of an envisaged action mea- 
sures of non-specificity and numerical imprecision are used. Such measures are either 
introduced or reviewed in separate chapters of this thesis. In this mainly theoretical part 
of the document the problem how to estimate the quality of an obtained classification 
result is tackled in a fairly abstract manner. Thus the results of that analysis can also be 
applied in a wider context than the one established through the considered active recog- 
nition application. 


The innovations presented in this work are 
e The construction of a new planning algorithm for active object recognition and 


e the comparison of different implementations of this algorithm based upon different 
uncertainty calculi, plus 


e the introduction and exploratory application of new measures of non-specificity and 
numerical imprecision. 


Recognition performance is measured on a set of realistic objects. It is clearly demon- 
strated that active fusion is able to solve recognition problems that cannot be solved by 
static approaches. Action planning is shown to be necessary and beneficial as the results 
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of random action selection schemes are distinctively worse. 


The single classifier can be made less sophisticated if multiple observations are taken. 
This is demonstrated by using a feature space of only 3 dimensions for a database that 
contains 14 objects at 72 poses. Nevertheless, the system is able to perform accurate pose 
estimations and object identifications. The chosen active approach therefore paves the 
way for the use of large object databases with numbers of object models far beyond those 
usually reported in the literature. 


It is found that the naive probabilistic implementation performs best for the chosen 
model data-base if the outlier rate is low. In case the outlier rate increases averaging 
fusion schemes are able to suppress outliers effectively, thereby allowing to achieve ro- 
bustness at reasonable costs. 


The described approach for planned action selection has a high potential to be success- 
fully applied to other problems than active object recognition. Various obvious extensions 
are discussed: different feature spaces, multiple sensors, multi-object scenes, applications 
in robot navigation. 


We conclude that the presented active recognition algorithm forms a basic solution 
for many different active fusion problems and its future extension to more sophisticated 
applications can be anticipated. 


Zusammenfassung 


Diese Arbeit beschreibt ein aktives Objekterkennungssystem, welches auf der Integra- 
tion ungenauer Information beruht. Der Kontext fiir das behandelte Problem ergibt sich 
aus dem Ziel, die allgemeine Struktur von bildverstehenden Systemen zu erforschen, die 
fahig sind, Informationen zu integrieren und Informationsquellen aktiv auszuwahlen. Ein 
intelligentes System kombiniert Wissen tiber visuelle Informationsquellen, verschiedene 
Abstraktionsstufen der Problemreprasentation und Objektzuordnung, sowie Bildanalyse- 
prozessen, um die von aktiv ausgewahlten Quellen stammende visuelle Information zu 
integrieren. Das vorliegende System stellt eine Teillosung dieses Forschungsvorhabens fiir 
den Bereich der aktiven Objekterkennung dar. 


Das entwickelte System ist in der Lage, mehrdeutige Klassifikationsergebnisse aufzulosen, 
indem es die Kamera aktiv neu positioniert, um weiter Aufnahmen zu machen. Der Ansatz 
baut zwar auf einer ansichtsbasierten Objektreprasention auf (dem parametrisierten Eigen- 
raum), aber die Aktionsplanung ist grofteils unabhangig von irgendwelchen Details der 
Objekterkennungsumgebung. 


Um die vielversprechendste Aktion zu finden, bewertet das System die zu erwartende 
Niitzlichkeit aller zur Verfiigung stehenden Aktionen. Verschiedene Unsicherheits- und 
Ungenauigkeitsmafe werden fiir die Abschatzung der Niitzlichkeit potentieller Aktionen 
benutzt. Solche Mafe werden in eigenen Kapiteln der vorliegenden Arbeit besprochen 
bzw. neu eingeftihrt. In diesem hauptsachlich theoretischen Teil des Dokuments wird 
das Problem der Qualitatsabschatzung von erhaltenen Klassifikationsresultaten auf ab- 
strakte Weise behandelt. Die Ergebnisse dieser Analyse konnen daher in einem tber das 
betrachtete Anwendungsbeispiel der aktiven Objekterkennung weiter hinausreichenden 
Rahmen Verwendung finden. 


In dieser Arbeit werden folgende Innovationen vorgestellt: 


e Die Konstruktion eines neuen Planungsalgorithmus fiir die aktive Objekterkennung, 
sowie 


e der Vergleich verschiedener Implementationen dieses Algorithmus, basierend auf 
verschiedenen Theorien zur Modellierung unsicheren Wissens, und 


e die Einftthrung und Anwendung neuer Unsicherheits- und Ungenauigkeitsmafe. 


Die erzielte Erkennungsrate des Systems wird anhand realistischer Objekte experi- 
mentell untersucht. Es wird gezeigt, da8 durch aktive Fusion Objekterkennungsprobleme, 
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die fiir statische Ansatze unldsbar bleiben, bewaltigt werden k6nnen. Durch Vergleich mit 
Systemen, die ihre Aktionen zufallig wahlen, wird der Nachweis erbracht, daf die Planung 
der nachsten Aktion notwendig ist und sich auferst vorteilhaft auf die Erkennungsrate 
auswirkt. 


Einfacher aufgebaute Einzelklassifikatoren konnen benutzt werden, wenn mehrere Beobach- 
tungen integriert werden. Ein Merkmalsraum der Dimension 3 reicht zur Erkennung von 
14 Objekten aus jeweils 72 verschiedenen Ansichten aus. Trotz dieser geringen Anzahl von 
benutzten Merkmalen, ist das System in der Lage, alle Objekte eindeutig zu identifizieren 
und deren Lage genau abschzuschatzen. Der gewahlte aktive Ansatz ebnet damit den 
Weg zur Einfithrung sehr grofer Objektdatenbanken, die weitaus mehr Objekte erkennen 
konnen als heute gangige Systeme. 


Die naiv-probabilistische Implementation erreicht bei geringer Ausreiferrate die hochste 
Erkennungsrate. Bei hoheren Fehlerraten des Objekterkennungssystems, koénnen Aus- 
reifer durch auf Mittelwertsbildung beruhenden Ansatzen zur Informationsintegration 
sehr effektiv unterdriickt werden. Auf diese Weise laBt sich ein robustes System mit be- 
grenztem Aufwand konstruieren. 


Der beschriebene Ansatz fiir die aktive Auswahl von Aktionen hat ein grofes Potential 
fiir Anwendungen, die titber die Objekterkennung hinausgehen. Einige offensichtliche Er- 
weiterungsmoglichkeiten werden diskutiert: andere Merkmalsraume, Multisensorsysteme, 
Szenen mit mehreren Objekten, Anwendungen in der Roboternavigation. 


Der vorgestellte aktive Objekterkennungsalgorithmus stellt eine Basislosung fiir ver- 
schiedenste Probleme der aktiven Informationsfusion dar. Die zuktinftige Erweiterung des 
Systems auf kompliziertere Anwendungsbereiche kann erwartet werden. 


Acknowledgements 


Many people have contributed during the development of this thesis and I would like to 
thank them all. First of all I would like to express my sincere gratitude to Axel Pinz and Is- 
abelle Bloch, my supervisors. I am very much indebted to both for a multitude of reasons. 


Axel has given me the opportunity to switch from physics to computer vision and 
he has allowed me to join a team that works on a subject which is probably closest to 
his heart. He has excelled in providing support at every stage of my thesis. I feel truly 
thankful to Axel for giving me the freedom and encouragement to pursue this work. 


I had the great pleasure to work with Isabelle during autumn 1997 and summer 1998. 
Isabelle is a highly competent researcher and this experience has enourmously contributed 
to my understanding of uncertain information fusion. I deeply appreciate her interest in 
my work which she has expressed numerous times through extremely helpful discussions. 


Another thing, Axel should be given credit for, is that he has gathered a group of 
such truly insightful students. In particular, I feel very much indebted to Lucas Paletta, 
Manfred Prantl and Harald Ganster. Manfred was the first of our group to envisage the 
idea of using the classic parametric eigenspace [88] for active vision purposes in a manner 
similar to Murase’s and Nayar’s idea for static off-line illumination planning [89]. Man- 
fred’s idea served as a real catalyst that allowed me to focus thoughts on the important 
issues of view-based active recognition [5]. Lucas has stimulated the use of probability 
distributions in eigenspace which he used in a neural architecture for reinforcement learn- 
ing schemes [97]. Lucas has also provided the training data for the described experiments. 
The algorithm for active fusion presented in part 2 has first been designed in a fuzzy fu- 
sion framework. Discussions with Manfred have helped me greatly in translating the fuzzy 
version to the language of probability theory [27, 26]. Later on the same algorithm has 
been implemented using possibility theory and Dempster Shafer theory of evidence [28]. 
For the latter implementation Harald could provide many helpful comments. Manfred 
has also made the algorithm easily accessible through a user friendly graphical interface 
which he had originally designed for graph based object recognition experiments. This has 
contributed a lot to the broader acceptance of the algorithm as a valid tool for planning in 
active object recognition. All these efforts reflect the strong commitment of Lucas, Man- 
fred and Harald and I would like to express my deep appreciation of their contributions. 
Thank you to the whole stimulating group at the Institute for Computer Graphics and 
Vision. Graz, June 1999 


il 


ill 


lv 


Contents 


Introduction 

1.1 Imperfect Information and How to Model It ................. 
LSA OM HUSTON: & 5 400-410) S in teat Gia ack eo ole eae eb a eens 
DOr CCTV TS URIL, 22 coat 8h 0 oe cas ta ada pe ek E Oe SP ee A Peed eh Gore bs A 
1.4 The Scope and Structure of this Work ................00.4. 


Object Recognition 

2) “Objeck. Retog nition Approaches sc. vex Seve 84s A ce hee @ A 
2.1.1 Object Representations... ...........0 20020 eee 
2.1.2 Object Recognition Algorithms .................... 
2.1.3 Generic vs. Specific Object Recognition ............... 

2.2 Eigenspace Based Object Recognition. ..............-...40-. 


Active Fusion for Object Recognition 

3.1 Active Fusion in Object Recognition .................00-. 

3.2 Related Research in Active Object Recognition ............... 

6.0) {Renve: Object RecoomiblOn: sn & pean G aad eae Ee Oe Oe 
3.3.1 View Classification and Pose Estimation ............... 
gio-2 Information: NtevratiOn:. <4... fe: 2 Gx Hones So) Rare ga Nae a 
Boone: - RehiOle WAIST. 4 ik ok ee Sh gi a as ae Ae es Dh Aa 

34. The Complexity of View Plammnime@:..2 4-0 2-2 4-4 2 ee de 2. @ ek ee & eee 

oo. JDiseussionvand Oublooky 24: 4.4 23-40 ek Mahal ee wea’ oe ek 


Defining Imprecision and Uncertainty 

AN, NOVO GY. 8 1S Gok b> Gy bake Ban siete a ava I Een ol Ee ee et Se 
4.1.1 Dubois and Prade’s classification. ..........0...-2.02.4. 
AA “The classification of Kir and: Yaoi. 6 22:4 oe Sere a ee ee 
413 Our Ghoiee of Terminology: ease. ct. eas eos $2 OES SS oS 
4.1.4 Basic Requirements for Measures of Imprecision ........... 

AD? NOMEN VIO 6 tose whocteth ie ay thous wnt a: sth ean an, Agwibe age ee aenibe ja ede Se a gs Se 


vi 


CONTENTS 

Measuring Fuzziness. 59 
5.1 A Review of Measures of Fuzziness. ...........2..0020 0000] 59 
5.2 Geometrically inspired measures of Fuzziness. .............0.. 62 
5.2.1 Measuring the distance from crisp sets. ..........-....-. 63 
5.2.2 Measuring the distance from the center set. ............. 64 
Measuring Non-Specificity. 67 
6.1 A Review of Measures of Non-specificity. ................20-. 67 
6.1.1 Measures of Non-Specificity in K,. .................-. 67 
6.1.2. Measuring Non-Specificity in K,,...................-. 70 
6.1.3’. Wieasuresof Non-Specincityim Kg. 828 ae a Ae aw lead ie 
6.1.4 A slight extension of existing measures of Non-Specificity.. . 2... 76 

6.2 New Measures of Non-Specificity. ..........0...2.02.220000.% 78 
6.2.1. Measuring Non-Specificity through distances in K,. ......... 78 
6.2.2 Measuring Non-Specificity with potentials in Ky. .......... 80 

6.3: omminiery and. Wiscyssiont .. tsa 4 elo a deg @ o-5 Ge helene Sk ORES 81 
Measuring Numerical Imprecision. 83 
7.1 Numerical Imprecision for Cyclic Variables .................. 85 
i2” he Problent with. the Varianee: +s 1 t-¢-yhe 6 poe a Ge es lee EE ee 86 
(C38. Summary and: Discussions 2. fk. aos, he deh hs eG hee 88 
Case Studies in Active Fusion for Object Recognition 91 
Sl) (“Probabilistic Active Fusion. - 26.5. 4.oM eckle eed ae ES Re aN 93 
8.1.1 Probabilistic Object and Pose Classification ...........2.. 94 
oa2. (Probabilistic Product Pusion. 2.4 22, eccnaiae Gives be bce 94 
8.1.3. Probabilistie View Planning 24-2 2- o0s- 2k aoe oes oe we Eels 95 

8.2 Possibilistic Active Fusions 2c og. ate Bala ue eG eG eee ae 96 
8.2.1 Possibilistic Object and Pose Classification... .........2.. 97 
8:2.2- Possibilistie (Minimum) Fiision 4). a oe ace os ave & ead a 97 
S.2.0° [Possibilistie View Planmng.....o°<. 1. as dae ye ah ee I te Poe 98 

S:0° Buzzy ACHVe FUSION, 6.05 dot AB a ek ee a eS 3 RO GN 98 
8.3.1 Fuzzy Object and Pose Classification ................. 98 
S.c.2 Fuzzy Pusion Operators so.-c4-0 Ga bata ke ba toh ee eS 99 

8.4 Evidence Theoretic Active Fusion ..............020.2 200004 101 
8.4.1 Evidence Theoretic Object and Pose Classification. ......... 102 
e422) Husion-in MyidenceVheory «. a4: 0-2 ice of $3 Ae wad ee ie die ek 104 
8.4.3 View Planning with Evidence Theory ................. 104 

Ort DISCUSSION, af. «. uit nahi ectecia: Sabie ce Pew Ge Ba. Shee MG Ge! Snub. Gad Zee eet td 105 
8.5.1... "Object and: Pose Classification. ...0.4.42.6 404 226 48 ob x weds 105 
O02 - USIORN ot gue Sie le 3 2 te See Su ee Soh a ie Set Eee Gt 106 


S.c.0: View Planning: 3.4.0 oa a Se ee Oe be NE oe 109 


CONTENTS 


9 Experiments 

9.1 What Sort of Experiments and Why? .................4-. 
9.2 The Experimental Environment ..................-.0240-. 

9.2.1 The Main Database of Object Models................. 

9.2.2. The Extended Database of Object Models .............. 
9.3 An Exemplary Sequence of Actions ............. 00000 0a ee 
OA’ -COnjunchive AChHVe BUstOn: (\: <2 a6 cu) ws Ove Weed Ae ed oe Oe A 
Gio. WDisjunetive AetIVe: BUBSION: 2° pa a: ie, Se. ee Pea See Ow wa we ear o GL 
OG" . zavetapine AChVe Dision & & oa... 4 gw Gra ai due Gow al deete el) he ed 
9.7 Different Measures of Imprecision ...............2.2+0 0000 
9.8 Comparing Different Measures of Non-Specificity .............. 
O20" SDISGUSSION: $1.3 (yak eS eae a ee OEE ew Gb ghee BN 


10 Critical Discussion and Outlook 


vu 


111 
111 
112 
113 
118 
121 
a2 
131 
132 
134 
141 
143 


155 


vii CONTENTS 


CONTENTS 


Chapter 1 


Introduction 


In image analysis huge quantities of uncertain data have to be processed and classified 
quickly and efficiently. Animals and human beings are able to perform this task without 
conscientious efforts. Today’s machine vision systems sometimes display remarkable ca- 
pabilities for certain well defined tasks (e.g. in industrial applications) but the problem 
of automatic image interpretation in general has not yet found a satisfactory solution. A 
major problem to overcome is to deal with the imperfect nature of the information that 
can be extracted from the images. 


1.1 Imperfect Information and How to Model It 


Imperfect information is present at various levels of the image interpretation chain. A 
complete image recognition system is usually viewed as consisting of three levels, namely, 
low level, mid level, and high level. The low level routines perform image processing 
tasks such as enhancement, noise reduction, primitive feature extraction. Mid level rou- 
tines extract more complicated features (e.g. by grouping more primitive features) and 
may perform preliminary classification tasks. The high level tasks map the extracted 
(compound-)features to stored object-models and may eventually come up with a de- 
scription of the observed scene. Within this hierarchy imperfect knowledge is caused by 
various factors a few of which are (see also [103]) 


e Finite geometric and radiometric resolution of sensors causing e.g. mixed pixels. 


e Sensor noise and variations in the environment and the object parameters (lighting, 
object-size, details of appearance etc.), 


Finite precision in calibration, motions etc. 


e Processing errors in feature extraction. 


Classification errors. 


Conflicting information when querying various sensors. 


4 CHAPTER 1. INTRODUCTION 


e Not all the additional knowledge necessary to solve the ill-posed problem of recov- 
ering a 3D scene description from a 2D image may be available or can be used 
exhaustively. Usually heuristics are necessary to search in huge model-databases. 


Until the mid-80’s conventional approaches to image analysis and recognition were 
usually based upon segmenting the image into meaningful primitives (regions, curves, 
lines, edges, corners) and computing various unary features (e.g. area, perimeter, length 
etc.) and binary features (angles, distances etc.). Decision rules and grammars were in 
wide use to describe, interpret and classify the extracted features [56]. Notwithstanding 
the previously mentioned presence of imperfect knowledge, in a conventional system each 
of these operations involved crisp decisions. Since any decision made at a particular level 
has an impact on all higher-level activities conventional systems multiplied the shortcom- 
ings of low-level feature extraction. 


This fact has stimulated a lot of research efforts on the application of theories de- 
signed to handle imperfect information for image processing tasks. Stated in terms of soft 
computing, uncertainty should be modeled, alternatives should be left open, and a ‘hard’ 
decision should be drawn only towards the end of the processing. Consequently we have 
been witnessing an ever growing interest in applying tools such as probabilistic reasoning, 
evidence theory and fuzzy set theory to various problems of computer vision in the last 
decade. In this thesis all three theories will be used for the implementation of an active 
vision system. A certain familiarity with these theories of uncertain knowledge will be 
assumed in the following. 


1.2 Information Fusion 


When modeling uncertainty, researchers aim at making recognition tasks more robust by 
keeping various possible interpretations until a final decision is made. The advances in 
sensing technologies have opened an alternative route to improve robustness: by combin- 
ing information extracted from various sources or with different routines the systems final 
output is based upon the “opinion” of multiple “decision makers”. The idea of combin- 
ing features and using combined decisions is motivated by the fact that different process 
designs and different sensor characteristics will usually result in different weaknesses and 
strengths of image analysis algorithms. This allows for the compensation of weaknesses 
of interpretations relying on single observations by the output of other routines that use 
a more appropriate input or processing scheme [2, 37, 59, 1, 104]. The combined informa- 
tion items may be crisp [40] but the full power of information fusion in computer vision 
is exploited only if the system takes care of the inherent imperfection of the fused infor- 
mation [23]. We will be concerned only with fusing imprecise information in the following. 


The three levels of general recognition systems are reflected in three fusion levels which 
have been categorized by Dasarathy [40] according to their I/O-characteristics. In figure 
1.1 the data-level is followed by the feature level which is followed by the decision-level. 
These three levels correspond to low level, mid level and high level information that is be- 
ing processed. Indeed, from an information fusion point of view most object recognition 
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Fae Decision 
Decision in- output 
Decision out 
fusion 


Decision input 
Figure 1.1: Alternative fusion input-output characterizations from Dasarathy [40]. 


systems extract class and pose estimates through hierarchical fusion processes. 


Information fusion plays an important role in many different fields of computer science. 
In image processing applications it is often necessary to register the information provided 
by different sensors before fusing it. Prior registration is generally considered unavoidable 
at the data level (pixel and low level features) and at the feature level. At the highest 
level the decisions for object hypotheses are combined. At this level exact registration may 
become less important for a couple of reasons. The information is already representing 
relatively “abstract” quantities (such as object hypotheses) that may exist for every image. 
In our object recognition experiment to be discussed below the registration step will not be 
necessary for exactly that reason. Another reason why registration may not be necessary is 
that very often structural information is more abundant at higher levels and registration 
may be performed implicitly. To sum up, solutions for particular problems (such as 
those we will be dealing with in this work) can be found without solving the registration 
problem. In many other settings, however, the registration step is unavoidable and poses 
tough problems. One example is the analysis of multi-object scenes, a field where no 
completely satisfactory solutions have been found until now. 


1.3. Active Fusion 


The idea of making vision systems active to solve problems that are ill-posed, nonlinear or 
unstable for a passive observer has been advocated forcefully since the late 80’s by Aloi- 
monos et al. [4, 2] and Bajcsy [7, 8]. These investigations have also motivated a broader 
view-point on information fusion. Active fusion as suggested by Pinz et al. [103] extends 
the paradigm of information fusion, being not only concerned with the methodology of 
how to combine information, but also introducing mechanisms in order to select the in- 
formation sources to be combined. This method is expected to lead to reliable results at 
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reasonable costs as compared to a ‘brute force’ combination of all available data sources. 
After the introduction of the ideas related to active fusion active approaches to remote- 
sensing [103], illumination planning [107] and object recognition tasks [26, 27, 28, 108] 
have been undertaken to explore the merits and limitations of the active fusion frame-work. 


ne 
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: . active image proc., 
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world image 
description description 
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le ... representations integration object— 
scene 
description 
... processes 


modeling 
Figure 1.2: The concept of active fusion controlling a general image understanding framework 
from Pinz et al. [103]. 
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Figure 1.2 illustrates the ideas behind active fusion as presented by Pinz et al. [103). 
Active fusion steps are assumed to perform the control task of a general image understand- 
ing framework. The active fusion component constitutes a kind of expert system/control 
mechanism, which has knowledge about data sources, processing requirements, and about 
the effective selection and combination of multi-source data. The upper half of the draw- 
ing corresponds to the real world situation, while the lower half reflects its mapping in 
the computer. Boxes and ellipses denote levels of representation and levels of processing, 
respectively. Solid arrows represent the dataflow, dashed ones the controlflow in the im- 
age understanding system. The range of actions available to the active fusion component 
includes the selection of viewpoints (i.e. scene selection and exposure), the activation of 
image analysis modules (e.g. segmentation, grouping), up to the direct interaction with 
the environment (exploratory vision). The individual modules are activated by the active 
fusion component and upon termination, they report their results including confidence 
measures and an indication of success or failure. These partial results are then integrated 
into the systems current description of the situation (image, scene, world description). 
In [29] an active fusion cycle has been designed that follows closely the expert system 
paradigm. We will investigate active fusion in the framework of object recognition using 
the parametric eigenspace and relying on the explicit optimisation of a criterion used for 
action selection. 
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Figure 1.3: Organization of the thesis. 


1.4 The Scope and Structure of this Work 


The main topic of this thesis is uncertain information fusion for active object recognition. 
An algorithm for fusing uncertain information within an active recognition experiment 
will be developed and implemented using various uncertainty calculi. The thesis is of 
a modular nature with various inter-related parts. Both from a conceptual and from a 
technical point of view there are strong links between the parts of the thesis. Figure 1.3 
is intended to serve as a guide for the reader to relate the different issues to be discussed. 


e Beginning with chapter 2 we are concerned with the issue of object recognition in 
general and the chosen recognition system in particular. On top of that system an 
active fusion algorithm will be built in the following chapter 3. The general outline of 
the active fusion algorithm will be presented assuming fuzzy aggregation schemes for 
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information integration. These two fundamental and partially introductory chapters 
settle the stage for all further discussions. 


For the following chapters the reader may take two separate routes. Those who are 
mainly interested in active recognition may prefer to follow the right path in figure 
1.3 and continue in chapter 8 with the discussion of other implementations of the 
active fusion algorithm. In particular, a probabilistic, a possibilistic and an evidence 
theoretic approach will be discussed. 


The left route in figure 1.3 takes the reader first to a theoretic discussion of measures 
of imprecision before continuing with the actual implementation of the active fu- 
sion algorithm. Especially measures of non-specificity and numerical imprecision for 
fuzzy sets, possibility and probability distributions are reviewed and/or established. 
The need for measuring imprecision will arise naturally in chapter 3 during the dis- 
cussion on view-planning. Since the goal is recognizing an object and determining 
its pose the utility of an action will usually be related to the increase in numerical 
precision of the estimate of pose parameters and to the decrease in non-specificity 
of the object hypotheses. Various measures will subsequently be used for the active 
selection of the next action and in order to check for termination criteria. Even 
though the discussion of measures of imprecision is clearly motivated by the consid- 
ered active vision application it is kept rather theoretical in order to emphasize the 
general nature of the problem of measuring imprecision. 


The topics chosen in this thesis address important issues in current research. 


Active fusion is an important extension of the passive or static fusion scenario because 


by actively planning the next action the system behaves like an intelligent agent that 
is less likely to become confused by contradictory intermediate results and achieves 
its aim (recognition) more quickly. This is made possible by deferring final decisions 
until enough information has been gathered. Active fusion leads to more reliable 
results with a minimum number of additional observations. 


Certain tasks cannot be accomplished by static or passive sensors at all unless one 
uses a very large number of static sensors. One example that will be considered is 
the discrimination of very similar objects which differ only from a few viewpoints. 
The active vision system described below uses the information at hand to make 
preliminary class and pose estimates and then decides where to position the camera 
to improve these estimates. 


The presented approach relies on an eigenspace representation and is useful for prod- 
uct inspection, repair, assembly and grasping applications. However, the core active 
fusion scheme is of a more general nature and may be applied to object recognition 
tasks that are based upon different internal model-representations and recognition 
algorithms or even in robotics applications where the goal may be trajectory plan- 
ning or position estimation. 
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The impact of the theoretical developments in the chapters on measures of imprecision 
may be of a much wider scope than is covered by this thesis. 


Measures of imprecision are relevant to obtain an internal overall estimate of the 
quality of classification results. For active systems it is vital to assess somehow the 
quality of expected results. Only if the utility of envisaged actions can be roughly 
estimated it makes sense to plan actions at all. In addition, both for static and for 
active fusion schemes these measures can be used to evaluate termination criteria 
such as “terminate successfully if the pose estimation is sufficiently precise and the 
recognition module favors a very specific object hypothesis” . 


These measures can also be used in optimisation schemes, arising for example in 
relaxation labeling [30] as well as in possibilistic clustering [73] or in the evaluation 
of fuzzy clustering results [15]. The measures may be directly introduced into the 
objective function or they may be used to evaluate the quality of the obtained 
clustering. For instance, in [30] it has been proposed to allow for fuzzy labelings 
in relaxation labeling by adding a term measuring non-specificity in the objective 
function. In possibilistic clustering the same strategy can help to avoid very non- 
specific results such as for example multiple coincident clusters which are obtained 
frequently with present possibilistic clustering schemes [74, 9]. 


Still other uses for measures of imprecision can be envisaged in context dependent 
fusion schemes [24]. There they can be used to select the appropriate fusion operator 
depending on how much non-specificity is present. 


Summing up again the major contributions of this thesis, we present 


the exemplary and detailed design of an active fusion module for the task of ap- 
pearance based object recognition plus 


an experimental study on the effect of different fusion schemes for the proposed 
active control module and 


an exploration of measures of fuzzy imprecision and the development of new mea- 
sures, some of which are subsequently applied to active recognition. 
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CHAPTER 1. INTRODUCTION 


Chapter 2 


Object Recognition 


The problem of recognizing three-dimensional objects from two-dimensional images is one 
of the most challenging tasks in machine vision. In order to relate the methodology which 
we have chosen in this thesis to other approaches let us shortly review some major lines 
of research in the field of three dimensional object recognition and their embedding in 
the most influential general vision research methodologies. The following discussion is 
not intended to cover the subject in full depth but should facilitate relating the described 
work to the “bigger picture” in vision research. For more elaborate overviews on object 
recognition strategies the reader is referred to [61, 105, 6, 128, 110]. Since research in 
object recognition forms an important branch of research in vision we shall begin with a 
quick overview of the most influential paradigm for vision research in general. 


The Marr paradigm for research in vision 


Vision has been defined to be the process that, given a set of images, creates a complete and 
accurate representation of the scene and its properties [83]. Research in vision has a long 
history beginning with the considerations on perception by ancient Greek philosophers 
[100]. In this century the possibility to imitate visual capabilities of living beings by 
machines has attracted the attention of scientists from various disciplines resulting in 
intensive mutual interactions among fields like computer science and neuro-biology [115, 
3]. First successful solutions of constrained problems in automated object recognition 
were obtained during the late 60’s and early 70’s [113]. Soon after these initial attempts 
a dominant paradigm for computer vision research has emerged in the 1970’s out of the 
work of David Marr and his colleagues [83, 84]. Marrs paradigm for research in computer 
vision made explicit the need of finding answers to the following three major questions: 


e Computational theory: What is the goal of the computation ? Why is a particular 
theory appropriate ? What is the logic of the employed strategies ? 


e Representation and algorithm: What is the representation for the input and output, 
and what is the algorithm for the transformation ? 


e Hardware implementation: How can the representation and algorithm be realized 
physically ? 


et 
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The above methodology for machine vision research has been extended by Aloimonos 
to include the level of Stability Analysis |2] : How stable are the theory, algorithm and im- 
plementation under the presence of noise in real images ? Concerning the issue of stability, 
we will see below that fusing different observations will provide a significant improvement 
over systems relying on single observations. 


According to Aloimonos the above methodology is usually followed top down or bottom 
up. The top down approach starts with a computational theory for a general problem 
in vision and, making further assumptions to simplify the theoretical framework, solves 
a specific problem after the development of an appropriate representation and algorithm 
(e.g. structure from motion assuming rigidity). Until today a large percentage of fully 
developed computational theories deal with the physical processes associated with vision. 
Hence the top down approach is linked by Aloimonos to the so called reconstruction school 
which tries to reconstruct various physical parameters of the outside world or of the image 
in order that they may (once) be used in recognition systems’. 


On the other hand, one may follow the Marr paradigm in a bottom up way by develop- 
ing actual systems which perform certain practical tasks without an underlying complete 
computational theory. The basic rationale for following Marrs research outline from the 
bottom up lies in the assumption that the experience gained by implementing actual sys- 
tems will finally provide decisive hints for a complete computational theory of vision. The 
work to be presented in this thesis belongs to this category. We would like to emphasize 
here that a solution to a specific problem can certainly have a solid theoretical foundation 
within the domain of its applicability. However, the employed computational theory is 
not complete in the sense that it does not offer a complete solution to a fundamental 
problem in vision research. This methodology has been linked to the “recognition school” 
by Aloimonos. It is based on building systems that rely on the recognition of descriptions 
of objects that we see. This approach has been criticized on the grounds that it produces 
results of too narrow a scope instead of general principles. However, if it produces any- 
thing at all, it is guaranteed to produce results that can actually be applied [2]. 


We will not deal with the general problem of vision which aims at analyzing complete 
scenes of arbitrary content. Instead we restrict the task to the problem of recognizing 
single objects in images, given some knowledge of how certain objects may appear. Even 
this restricted problem touches many general issues which have to be solved for every 
object recognition system. 


2.1 Object Recognition Approaches 


There are various ways to categorize different object recognition approaches. One may 
for example distinguish broad classes defined by the 


e internal representation used by the system, 


‘However, top down research (in the Marr paradigm) does not necessarily imply reconstruction as is 
exemplified by Markov Random Field applications to model matching [79]. 
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e applied matching strategy and algorithms, 
e generality of objects and object classes that can be recognized (generic vs. specific). 


The notions of representation, algorithm and generality will provide natural means for 
us to relate the chosen methodology to other object recognition approaches. 


2.1.1 Object Representations 


We will shortly discuss each of the above issues, starting with the internal representation 
of the objects. In particular we will follow the common usage of discussing separately 
object-centered and viewer-centered representations in object recognition. Basri states at 
the beginning of his discussion of viewer-centered representations in object recognition one 
advantage of this distinction [10]: “Object-centered representations describe the shape of 
objects using view-independent properties, while viewer centered representations describe 
the way this shape is perceived from certain views. |..| The issue of representation is 
crucial for object recognition. It determines the information that makes an object stand 
out and the circumstances under which it can be identified. In addition, it divides the 
computational process into its online components, the ‘recognition part’, and off-line 
components, the ‘learning’ or ‘model-acquisition’ part.” 


The Marr paradigm and related object-centered representations 


Representation Primitives 
Image Intensity values 
‘primal sketch’ Zero Crossings, Blobs, Terminations 


Edge Segments, Boundaries, Groups 


‘25-D sketch’ Local Surface Orientation, 
Distance from viewer, 
Discontinuities in depth and surface orientation 


‘3-D model representation’ Hierarchically arranged 3-D models, each one based 
on a spatial configuration of a few sticks or axes, 
to which volumetric primitives are attached. 


Table 2.1: The levels of representation in object recognition according to Marr [83]. 


The first and until today most influential general paradigm for different levels of rep- 
resentation in object recognition is also due to Marr [83]. Table 2.1 depicts the suggested 
framework. This framework essentially starts with the ‘primal sketch’ consisting of low- 
level features that are extracted from the image. The next level is given by the ‘25-D 
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sketch’, a viewer centered description of visible surfaces. The suggested representation 
has reinforced research efforts in extracting shape from X (shape from stereo, shading 
etc.) and range sensor development to obtain the ‘25-D sketch’. Matching requires trans- 
forming the ‘25-D sketch’ into a representation of the three dimensional shape and spatial 
arrangement of the object. Consequently, at the highest level of internal representations 
a 3-D model of the object is stored in an object centered coordinate system. 


Following an initial suggestion of Binford [18], Marr and Nishihara proposed to rep- 
resent objects by 3-D models which themselves should be hierarchically composed from 
(generalized) cylinders [84]. In the following years additional suggestions for the repre- 
sentation of 3D-models have been made. One may broadly classify them into quantitative 
and qualitative representations, i.e. the fundamental building blocks of the 3-D model 
may be represented quantitatively (e.g. through the parameters of primitives ranging in 
complexity from polyhedral objects [64] over superquadrics [125] to generalized cylinders 
[117] ) or qualitatively (e.g. through relational structures encoding adjacency, parallelism 
etc. {17]) or it may be a combination of both [102]. In any case the object recognition 
system transforms the extracted image features and the internal 3-D model to a com- 
mon representation before matching. Quite often this amounts to transforming the image 
features to an object centered coordinate system. Consequently it has become common 
use to call representations which involve the explicit use of a 3-D model “object-centered 
representations’? [61]. 


Usually all object models are composed from a limited set of basic primitives. Until the 
mid 80’s this was motivated mainly by the reduction in operational complexity. In 1985 
Biederman has provided a sound theoretical basis for this approach [17]. His recognition 
by components theory claims that using only a limited number of qualitatively defined 
volumetric primitives (so called geons) one can represent and recognize most complex 
real-world objects (recognition by components). The idea has emerged from experiments 
which demonstrated that humans are able to distinguish objects after recognizing these 
primitives through a small number of non-accidental relations in viewpoint invariant prop- 
erties, such as straight vs. curved contours. Biederman has therefore also argued in favor 
of replacing the prevalent quantitative measurements in table 2.1 by qualitative represen- 
tations at higher levels. 


In the following years the geon representation was used by many researchers. Bergevin 
and Levine [14] made the first significant effort to build a vision system based on geons 
(PARVO). Dickinson, Pentland and Rosenfeld [44] tried to extract simple volumetric 
primitives from line drawings obtained from intensity images. For an overview of the 
development see [42] and [61]. The extraction of geons from optical images has proven 
to be extremely difficult, the major problem being robust image segmentation. Most ap- 
proaches relying on range images infer geons from a dense ‘25-D sketch’?. This approach 
facilitates technical problems and successful extraction of geons has been reported given 


?Even though viewer-centered coordinates are necessarily used for lower representational levels. 
3These techniques do not strictly follow Biedermans theory which suggests that orientation and depth 
discontinuities (edges) should encode all the necessary information for geon extraction. 
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that the image data is sufficiently ‘smooth’ [42, 111]. 


Object-centered representations describe the shape of objects using view-independent 
properties. In the following we shortly review appearance based or viewer-centered rep- 
resentations which describe objects through their appearance as observed from certain 
views. 


Appearance Based Representations 


There is a traditional dichotomy between object-centered and appearance based repre- 
sentations in object recognition [61]. Systems using viewer-centered representations or 
multi-view representations rely on a collection of images of the object for the zmplicit de- 
scription of its shape and other three dimensional features and do not use an explicit 3-D 
model of the object. Some psychophysical evidence [131, 51] seems to indicate that the 
human visual system does indeed also use a viewer centered representation’. Although 
the various appearance based approaches may differ substantially, they are all based on 
the premise that object centered three dimensional object descriptions are not required. 
That is, each object can be defined to a sufficient degree of accuracy through some stored 
views and intermediate views and features can be computed without any reference to a 
3D-model whenever necessary (view interpolation [132, 77, 55, 123] or feature interpola- 
tion [88]). 


A typical viewer centered model consists of a set of features extracted for a number 
of views of an object. These features may contain only information on the object’s ra- 
diometric appearance or they may also include certain shape attributes, such as depth or 
curvature. Speaking in terms of Marrs paradigm the representation may include any of 
the levels in table 2.1 up to the ‘25-D sketch’. 


Various other types of features (besides those suggested by Marr for the primal sketch 
and the ‘25-D sketch’) can be chosen. Usually one aims at features that are relatively 
invariant to changes in lighting conditions, or one tries to model the image variations due 
to changing light sources [90, 13]. Local features are often preferred over global image 
descriptors in order to be able to handle occlusion in multi-object scenes more efficiently 
[118]. If local features are used they are usually grouped into geometrically or radiomet- 
rically related perceptual entities leading to relational representations for object views. 
Only recently approaches that are robust under occlusion have been developed also for 
certain global image descriptors [78, 20]. 


Feature extraction is sometimes quite straightforward for appearance based approaches 
because the difficult step from viewer-centered features to 3-D features is not necessary. 
On the other hand, model matching becomes in general more complicated since inher- 
ently three dimensional objects are represented by their 2-D views. In general many dis- 
tinct views need to be recorded and the representation is not concise when compared to 


4For example, humans recognize objects more easily when they are seen from familiar viewpoints. 
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object-centered 3-D models”. A major problem is therefore the tractability of the model- 
database. In order to reduce space requirements each characteristic view often represents 
an entire range of viewpoints. Fortunately, for almost every object many features stay 
nearly constant during small changes in viewpoint, with the exception of “degenerate 
views” at which perceptual entities like faces appear or disappear [81, 88]. Thus the nec- 
essary set of views can sometimes be compressed considerably with a trade-off between 
the size of a description and its accuracy [33]. Views are traditionally organized in aspect 
graphs {101, 54] or networks [34] or additional views are interpolated among stored views 
[132, 88]. More often than not, the management and organization of the usually large 
database of views requires special off-line pre-processing techniques that allow faster on- 
line access to the relevant features [85, 19, 88]. 


We would like to conclude these comments on appearance based representations with 
a short statement on the importance of appearance based and object-centered internal 
representations. We believe that a combination of both will be a common characteristic 
of successful future object recognition systems. In fact, both “schools” can supply com- 
pelling psychophysical evidence for their point of view and we have already witnessed 
examples of a successful fusion of the two approaches. For instance, the system of Dick- 
inson, Pentland, and Rosenfeld recognizes geons by their characteristic views, and then 
assembles the volumetric parts into an object-centered description for comparison with 
object models [44]. Furthermore, object centered representations are usually needed for 
the analysis of multi-object scenes. Only by using an invariant three dimensional object 
representation the inherently three dimensional relations among different objects can be 
captured. Notwithstanding this observation, the representation we have chosen for our 
object recognition system is not object centered but purely appearance based. The reason 
for this choice will be detailed in section 2.3. 


2.1.2 Object Recognition Algorithms 


The task of recognizing an object comprises finding a match between a model and an 
image. The inputs are a library of object models and an image; the outputs specify the 
identity, pose, and perhaps certainty of any objects recognized in the image. It goes with- 
out saying that each recognition algorithm will depend strongly on the used underlying 
representation. Nevertheless a few basic common properties of most algorithms can be 
outlined (see table 2.2). The steps listed in table 2.2 are discussed in more detail in [105]. 
Not all of them are implemented in every object recognition system. For example, the 
recognition module we use proceeds directly from feature detection to matching. Various 
matching strategies are described in detail in [128]. 


Here we would to like stress another aspect which is closely related to active vision: 
The above steps are very often executed in a bottom up manner, i.e. proceeding from the 
image information to feature extraction, grouping, indexing, model-matching and finally 
verification. In contrast to this, a top down strategy starts from higher representational 


5Still, viewer-centered representations eliminate the need for elaborate computations to account for 
self-occlusion, which can again simplify the task of matching considerably for single object scenes. 
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feature detection — signal processing to detect features (or primitives) 
in the image and represent them as symbols. 


perceptual organization identify meaningful groupings of features 


indexing use these features to select likely models out of 
a library of object models 


matching find the best match between extracted features 
and those of the selected models 


verification decide whether that match suggests that 
the modeled object is present in the image 


Table 2.2: Common processing steps in object recognition (from [105]). 


levels (e.g. the level of object models) and activates specific processing schemes to an- 
swer high level questions®. Quite often these questions arise in the context of hypothesis 
verification, i.e. after preliminary results from a first bottom up processing attempt have 
been obtained. One such system has been described by Lowe (SCERPO [82]): Low level 
features are clustered into perceptual groups. These groups are matched to 3-D object 
models which are finally verified by a top down step. 


An extension of this approach is the use of feedback loops in bi-directional systems 
that allow for both top-down and bottom up processing chains. In principle such feedback 
loops can occur at every level of the recognition chain and the active fusion framework 
presented in fig. 1.2 contains possible feedback loops at every processing step. For ex- 
ample, to verify edge features a system could decide to perform different algorithms or to 
change imaging conditions. In [109] it has been demonstrated that such decisions can be 
made even before the first object hypothesis has been obtained. 


The system to be presented below is bidirectional with a single loop from the highest 
level (object hypotheses) to the lowest level (image acquisition). Object hypotheses are 
extracted from a given image in a purely bottom up processing chain. The system uses 
the current object hypotheses and its data-base of views for planning the next action. The 
top-down action consists in a repositioning of the camera and acquisition of the next im- 
age which is again processed from the bottom up. In every bidirectional systems stopping 
criteria and mechanisms must be included that prevent them from looping indefinitely 
(combinatorial explosion). Common stopping criteria which will also be used below are 
(i) successful recognition, (ii) too many steps, and (iii) no expected improvement. 


°In the given context, bottom up and top down refer only to the processing chain from 2-D feature 
extraction to object model matching. They are not related to the bottom up and top down research 
methodologies as distinguished by Aloimonos within the Marr paradigm [2]. 
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Before we discuss in more depth the choices we have made in designing our system we 
would like to conclude the outline of some general principles in object recognition with a 
short comment on generic vs. specific object recognition. 


2.1.3. Generic vs. Specific Object Recognition 


A system performing specific object recognition is able to recognize only the objects used 
for learning and building the model-database. Even if an object (e.g. a table) belongs to 
a more general object class (e.g. the class of all table-like objects) of which already some 
instances are known, the system is usually not able to recognize other than the learned 
instances of the object class. In contrast to this, a generic object recognition system can 
be defined to be a system that recognizes an object even though the encountered object is 
not (explicitly) contained in the data base of object models. Truly generic object recogni- 
tion systems must be able to generalize from a few known instances of an object category 
to other instances of that class. 


Part Based Generic Object Recognition 


Research in the direction of generic object recognition has often been based upon decom- 
posing the objects into primitives. The idea being, that qualitative structural descriptions 
of volumetric primitives facilitate incorporating higher degrees of generality and abstrac- 
tion into the system. For example, function based recognition systems - whose final 
classification result is based upon the inferred function for which parts of the object may 
serve - are usually built on top of an extraction module for volumetric primitives (Stark 
and Bowyer [126]). A notable exception is the work of Strat and Fischler who claim to 
“eliminate the traditional dependence on stored geometric models” by using knowledge 
of the context in which certain objects may be found [127]. In general, however, many 
directions for research in generic object recognition rely on qualitatively described volu- 
metric primitives. 


On the other hand it has also been noted that only qualitatively defined primitives 
are not fully adequate for describing and distinguishing objects which are similar in their 
coarse part structure [111, 42]. Furthermore the composition of objects from qualitative 
primitives may also necessitate the enhancement of geon theory. The question whether it 
is possible to specify uniquely the locations and extensions of connections between parts 
of the object in purely qualitative terms is still open. In other words, can or should 
generic objects (including natural objects in contrast to man-made objects ) always be 
decomposed into a set of (only qualitatively defined) basic primitives ? In fact, it is far 
from trivial to decide what kind of 3-D relationships between volumetric parts can be 
extracted from static images and how these relationships should be used to organize the 
model data-base. The return to more quantitative high level features such as generalized 
cylinders or CAD-models for the extraction of primitives and/or composition of objects 
(which should not be necessary according to Biedermans hypothesis but lies closer in 
spirit to Marrs original proposal) is partially motivated by these observations. 
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Viewer Centered Generic Object Recognition 


Turning to purely viewer-centered object recognition schemes it can be noted that the 
question if and how appearance based methods may be able to recognize object classes 
instead of object instances receives quite often a negative answer [61]. It is argued that 
models based solely on 2-D information are too hard to generalize, whereas three dimen- 
sional models can more easily cover object classes by using appropriate parameterizations 
or by using qualitative part decompositions. A contrary view-point is expressed for ex- 
ample by Edelman whose work aims at building appearance based systems that perform 
categorization, rather than recognition of objects. He shows that, although categorization 
cannot rely on interpolation between stored examples, knowledge of several representa- 
tive members, or prototypes, of each of the categories of interest can still provide the 
necessary computational substrate for the categorization of new instances. Consequently 
the resulting representational scheme is based on similarities to prototypes [50, 52]. How- 
ever, as these ideas have not yet found a definite translation into computationally efficient 
implementations, we will use a more classic representation of object views for the active 
system to be described below. Our system is therefore not able to generalize from learned 
examples. 


2.2 Eigenspace Based Object Recognition 


The given overview on general research directions in object recognition will be used below 
to clarify the role of the chosen object recognition approach in relation to alternative ap- 
proaches. Before doing so, let us discuss the details of the chosen recognition algorithm. 
This section contains a short review of the eigenspace technique in the spirit of Nayar 
and Murase [88]. In the following section 2.3 we will discuss the reasons for the particular 
choice we have made. 


Eigenvector decomposition has for a long time proven to be an effective tool for struc- 
turing high dimensional representations of phenomena which are intrinsically low dimen- 
sional. In computer vision, eigenvector analysis of image sets has been used successfully 
for example for automatic face recognition using “eigenfaces” [130], visual servoing for 
robotics [89] and visual learning for object recognition [88]. 


The eigenspace approach requires an off-line learning phase during which images of 
all objects from many different views are used to construct the eigenspace (see Fig. 9.5). 
In sub-sequent recognition runs the test images are projected into the learned eigenspace 
and the closest model point is determined. It is ensured that all images of all objects are 
of the same size. For each digitized image the object is segmented from the background 
by assuming small brightness variations for the background. The background is assigned 
zero brightness value and the object region is re-sampled such that it fits a pre-selected 
image-size. The scale normalized image is written as vector ¥ by reading pixel brightness 
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values in a raster scan manner: 
eS (oi eeen (2.1) 


Unwanted overall brightness changes due to variations in the ambient illumination and 
aperture of the imaging system are eliminated by brightness normalization: 
1 
|X| 
The universal image set U = {X,,y,} is the set of all normalized training images taken for 
all objects 0; with pose, lighting and possibly other parameters y;. A new image matrix 
is obtained by subtracting from each image the average c of all images in U: 


XS Oxon Roueag ees ed (2.3) 


with Xo,, ‘= Xo», — ¢. The image matrix X is of size N x (nony) with N denoting 
the number of pixels in each image, n, denoting the number of models and n, denoting 
the number of samples of the pose and other parameters. Next, we define the N x N 
covariance matrix 


Q:= Xx? (2.4) 
and determine the eigenvectors e; and the corresponding eigenvalues );:" 


Since Q is hermitian we may assume that < e;,e; >= 6;;. We sort the eigenvectors in 
descending order of eigenvalues. The first & eigenvectors are then used to represent the 
image set X to a sufficient® degree of accuracy: 


k 
Kou > Gis (2.6) 

s=1 
with g, =< €s,Xo,,, >. We call the vector go,,5, := (91,---, 9k) the projection of xp, ¥, 


into the eigenspace. Under small variations of the parameters y; for a fixed object 0; the 
image Xy,,,, Will usually not be altered drastically. Thus for each object 0; the projections 
of consecutive images X,,,,, are located on piece-wise smooth manifolds in eigenspace pa- 
rameterized by y;. Examples of such manifolds are depicted in Fig. 9.3. 


The manifold in eigenspace will usually inherit the topology of the manifold that is 
defined through the physical motion of a point on the object when changing the physical 
parameter y;. In particular for cyclic variables (e.g. angular variables) the manifold in 


Since not all eigenvectors are needed this does not have to be done for the large matrix Q directly. 
Instead one may use the eigen-decomposition Qé; = \,6; of Q := XTX to obtain ny Ny eigenvectors and 
eigenvalues of Q through A; = de Ee; = het Xe, Q may still be a rather large matrix requiring special 
treatment. See [88] for a discussion on various numerical techniques. 

Sufficient in the sense of sufficient for disambiguating various objects. Quantitatively we demand 
ee, di /Trace(Q) > threshold. 
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eigenspace will be closed. 


In addition, the dimensionality of the manifold will reflect the physical degrees of 
freedom expressed in the parameters y;. With fixed light conditions rigid objects have 
six degrees of freedom. If the objects are at a fixed position in space we have three de- 
grees of freedom to characterize the pose. If we release the condition of a single object 
we introduce maximally six further degrees of freedom for each additional rigid object 
(including all those images where one object occludes the other). Each point light source 
can potentially introduce another six degrees of freedom because of its pose and location 
and further degrees for internal parameters of the source (intensity). These are upper 
estimates. For convex Lambertian surfaces it can be shown that the additional dimen- 
sionality of the manifold due to illumination parameters is 3 no matter how many light 
sources are used [90]. 


In our experiments we will use only rotations around the z-axis. Thus the manifolds 
are 1-dimensional and parameterized by y. The location parameters do not enter be- 
cause all variations in the two dimensions parallel to the camera plane and inside the 
view volume are accounted for by segmenting the object from the background. Depth 
is eliminated by re-sampling to a fixed image size assuming a weak-perspective camera 
model. 


Using the approximation (2.6) it is easy to show that for two image vectors x,,,, and 
Xo,,p, we have 


|Xoa; — Xo, ps a Xo;,9;) Xor,ys 7 © IlSo..6; — Sores : (2.7) 


The above equation shows that maximizing the correlation coefficient < Xo;,9;;Xo..p, > 
minimizes the sum of square differences which in turn can be approximated by the distance 
of the projections in eigenspace. This observation leads directly to the recognition method 
proposed by Nayar and Murase. In order to recover the eigenspace coordinates g(J/) of an 
image J during the recognition stage, the corresponding image vector y(/) is projected 
into the eigenspace, 

g(I) = (e1,...,e%) y(J). (2.8) 


The object o,, with minimum distance d,,, between its manifold and g(J) is assumed to 
be the object in question: 


dm = min min ||g(J) — Bo.vsll- (2.9) 
i j 
This gives us both: an object hypothesis and a pose estimation. 


In order to improve the pose estimation Murase and Nayar have suggested various 
extensions of the above algorithm. For example, it is possible to resample the manifold 
at a higher rate. This is done by interpolating intermediate manifold points and their 
parameter values. Additionally, one can use individual object eigenspaces that are built by 
taking only images from one specific object for all values of the parameters y;. Once the 
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object hypothesis has been obtained using the universal eigenspace, the image vector y(/) 
is also projected into the eigenspace of object o,, and a better estimate of the parameter 
(p; is obtained. None of these extensions will be used in our experiments. Instead we 
extend Nayar and Murase’s concept of manifolds by introducing probability densities in 
elgenspace. 


Probability distributions in Eigenspace. 


Moghaddam and Pentland [86] used probability densities in eigenspace for the task of 
face detection and recognition. Sipe and Casasent [124] use them for active recognition 
in a manner similar to the probabilistic approach to be discussed below. The recognition 
algorithm of Nayar and Murase has been shown to be a special case of probabilistic rea- 
soning in eigenspace in [26, 27]. 


Let us denote by p(glo;,y;) the likelihood of ending up at point g in the eigenspace 
after projecting an image of object 0; with pose parameters ; °. The likelihood is esti- 
mated from a set of sample images with fixed 0;,.p;. The samples capture the inaccuracies 
in the parameters y such as location and orientation of the objects, fluctuations in imag- 
ing conditions such as moderate light variations, pan,tilt and zoom errors of the camera 
and segmentation errors. Fig. 9.3a depicts an eigenspace representation of the image set 
of one object which is used to derive the likelihoods p(glo;,;). Using parameterized 
probability densities which have been learned from samples an efficient Bayes classifier 
can be constructed. Details of this procedure will be described in section 8.1.1. In the 
discussion of our active fusion algorithm in chapter 3 we will assume only that some sort 
of classifier (not necessarily of probabilistic nature) exists which delivers numerical values 
that quantify the uncertainty associated with each possible object-pose hypothesis. 


2.2.1 The Essence of the Eigenspace Method 


We would like to conclude this introduction of the eigenspace method for object recogni- 
tion with a few comments on what we believe are its most important features. 


The approach contains a precise description how to establish a unary feature space 
for appearance based object recognition. The unary features are calculated by project- 
ing complete images into the eigenspace (global image descriptors). Other unary feature 
spaces could well be conceived. For example one might calculate image moments de- 
fined through Mp4 := >), 0, x’y"I(x,y) with x,y denoting image coordinates (possibly 
re-centered), [(az,y) denoting the gray-value at position (x,y) and p,q € No [56]. Such 
moments constitute easily computable unary features that may be used instead of the 
eigenspace coordinates. The object regions are again segmented from the background 
and images are re-scaled in size before computing the moments!®. One can obtain every 
desired dimensionality of the feature space by just defining upper limits on p and q. This 


°We use capital P to indicate probabilities and lower case p to indicate probability densities. 
10There is no need to use invariant moments. Quite on the contrary, this would be counter-productive 
since it would complicate pose estimations. 
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opens the way to build feature spaces that have as many dimensions as are needed to 
disambiguate different objects. Recognition can proceed exactly along the same lines. 
In fact, the idea of organizing object views into manifolds and recognizing new views by 
calculating the distance of the feature vector to the learned manifolds can be applied to 
every unary feature space. The reader may also notice that using moments M,,, the off- 
line learning phase becomes less intricate because the computations necessary to establish 
the eigenspace (Karhunen-Loeve transform) are not needed. This is also true for a variety 
of other unary feature spaces. 


On the other hand, it is exactly the principal component analysis which is one the 
major strengths of the eigenspace approach: The eigenspace recognition scheme is able 
to adapt the feature space to the actual data. However, while it is sometimes mistakenly 
assumed that all the images from all possible view-points are necessary to construct the 
eigenspace one may as well work only with a limited number of representative views for 
each object to construct the eigenspace. Or one may keep a once established eigenspace 
also for a larger database of object models than originally conceived. The important 
feature for object recognition is that the manifolds of different objects should be well sep- 
arated. As long as this is the case, there is actually no need to change the feature space. In 
fact, it has long been recognized that changing the feature space can be quite demanding 
for eigenspace recognition (because it requires access to all images for the whole model 
database) and algorithms have been developed to extend existing eigen-spaces instead of 
reconstructing them from scratch [36]. 


Hence, we conclude that 


e the following discussion on active object recognition could have been based equally 
well on other unary feature spaces than the eigenspace, 


e the unique feature of eigenspace recognition (as seen from our point of view) is 
the ability of the approach to adapt the feature space to the appearance of the 
objects. This makes the approach more flexible than others because it can be 
expected that after performing a principal component analysis feature spaces of 
lower dimensionality can be used. 


However, for the envisaged research in active object recognition this latter charac- 
teristic of eigenspace recognition is largely irrelevant. We will therefore refrain from 
reconstructing the eigenspace when enhancing the model-database in chapter 9 just as 
we would not increase the dimensionality of a feature space based upon image moments 
unless we would be forced to do so because the manifolds of different objects would start 
to overlap strongly. In fact, we are actually looking for cases in which manifolds begin 
to overlap. Static object recognition approaches just fail to recognize the object in these 
cases because it is unclear which manifold is closest. We will be able to demonstrate that 
using multiple observations active recognition systems are still able to find the correct 
hypotheses. This happens essentially because manifolds usually overlap more strongly for 
certain views of the object. By repositioning the camera it is possible to select precisely 
those views where the overlapping of manifolds is less severe. 
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2.3. Why Ejigenspace Based Object Recognition ? 


The eigenspace algorithm of Nayar and Murase will be used to perform all object recog- 
nition tasks for our active fusion algorithm. The active fusion algorithm will control the 
setup on top of the eigenspace recognition module. We have mentioned above that the 
active fusion algorithm to be presented in chapter 3 could have been applied also on top 
of various other object recognition modules. Equipped with background knowledge on 
the most common object recognition approaches and the review of the chosen method 
we have indeed been able to sketch a very simple alternative recognition algorithm based 
upon the extraction of image moments. Hence, the need for a detailed justification of the 
chosen eigenspace method arises. This section is devoted to such a discussion from the 
perspective of active vision applications. 


In order to evaluate object recognition systems the following set of questions has been 
summarized by Grimson [57]: 


e Correctness: Is the underlying computational theory correct in the sense that it 
provides a working model of the considered recognition process ? 


e Scope: What kinds of objects can be recognized (rigid, articulated, soft), and in 
what kind of scenes ? 


e Efficiency: How much time and memory are required to search the space of alter- 
natives ? 


e Robustness: Does the method tolerate a reasonable amount of noise and occlusion 
in the scene and does it degrade gracefully as those tolerances are exceeded ? 


To answer these questions for the presented eigenspace technique let us list the major 
characteristics of the chosen approach. Eigenspace object recognition is an 


e appearance based technique for the task of 

e specific object recognition with applications to 

e single object scenes containing rigid real world or toy objects, 
e using global image features. 

The resulting system 

e relies on an automatic off-line learning phase, 


e has the potential to recognize objects and their poses in real time!. In general, 
recognition time depends on the time it takes to search for nearby points in a 
unary feature space. Various algorithms to speed up this search have already been 
developed (e.g. for the k-NN classifier). Since we use a comparatively small number 
of objects simple exhaustive search algorithms will suffice in our experiments. 


“This is also true for model-databases of larger sizes (compared to today’s standards), containing 
about 100 different objects [88]. 
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Through the use of probability distributions and scaling of images 


e the system can be made robust against small object-background segmentation errors, 
and imprecise positioning of the hardware. 


Still, some stringent restrictions remain: 


e Controlled illumination conditions. Because the applied global descriptors strongly 
depend on illumination conditions the approach is not robust under changes of light- 
ing conditions. 


e Eigenspace recognition is not robust against occlusion since it is tailored to single 
object scenes’. 


This list of constraints and benefits identifies the approach as a potentially successful 
candidate for industrial object recognition. In addition the eigenspace approach has some 
remarkable features which make it useful for our purposes. The goal of this thesis is 
to demonstrate and investigate benefits/shortcomings of enhancing object recognition 
by active components. Since our aim here is not to establish new object recognition 
approaches but to study the effect of active fusion within an existing algorithm we have 
chosen an object recognition scheme which is particularly amenable to being extended by 
active components. In active vision it is necessary for the system to have some idea about 
the current state of the recognition process and to be able to estimate what changes can 
be expected if certain actions are taken. These two issues have determined our choice. 


e The chosen unary feature space allows for a clear implementation of the processing 
chain leading to object hypotheses. The alternative use of relational representations 
(binary or n-ary feature spaces) would imply the need to segment the image of the 
object into meaningful parts. Notwithstanding the impressing successes of various 
segmentation algorithms in constrained application domains the problem of image 
segmentation has defied a general solution until now. If only unary feature spaces 
are applied and only single object scenes are analyzed a working object-background 
segmentation algorithm is the only prerequisite. 


e The appearance based representation of learned objects allows for a simple estima- 
tion of what features (images) to expect when moving the setup. Given the current 
candidate hypotheses it is possible to infer the most likely next image by just using 
the data-base of object views. The recognition algorithm can then be performed 
using the hypothesized features. Thus the expected new object and pose confidences 
can easily be determined. If an object centered representation was chosen instead it 
would be necessary to account for self occlusion to determine the best next viewing 
position. It is unclear whether this could be made without transforming the 3-D 
model to a 2-D representation. In any case, sophisticated intermediate processing 
is required to estimate the effect of changes in viewing position (see [102] for an 
example). We do not encounter any of these difficulties because the representation 
we have chosen for the task of object recognition automatically gives also access to 
the expected features for each new viewing position. 


!2Note, however, that recent successful developments are underway for the use of robust versions of 
eigen-recognition in cluttered multi-object scenes [20, 78, 53]. 
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e Similarly, if generic instead of specific object recognition algorithms were employed 
the changes in image content due to changes in viewing position could only be 
inferred qualitatively. Since active planning makes only sense if a reasonable expec- 
tation of the next observation can be obtained in advance? it is an open question 
whether the impact of active fusion to generic object recognition will be as fruitful 
as it is for specific object recognition. By remaining within the domain of specific 
object recognition we know very reliably what features can appear once the set of 
all views has been learned. 


e Finally, as already mentioned above, because of the restriction to single object scenes 
we can avoid all registration problems. 


In light of the above discussion we would like to point out that our choices have been 
motivated to a large degree by convenience and not always indicate true restrictions of the 
active vision algorithm to be presented in the next chapter. While the actual system we 
have built can be used only with the stated constraints the same active vision algorithm 
can be applied to more sophisticated object recognition schemes. In short, we argue that 
all factual restrictions can be accepted on the grounds that the major theme of this thesis 
is not object recognition per se but rather a self-contained demonstration of how existing 
object recognition paradigms can be enhanced by active fusion components. 


13This will be demonstrated for the case of unclear expectations due to outliers in chapter 9. 


Chapter 3 


Active Fusion for Object Recognition 


In this chapter we consider the application of actively planned fusion steps within the 
framework of appearance based object recognition. The next section is devoted to an 
overview of active fusion in object recognition. After reviewing related research in section 
3.2 the actual active fusion algorithm will be presented in section 3.3. 


3.1 Active Fusion in Object Recognition 


In order to avoid confusion during the following discussion we need to distinguish be- 
tween two aspects of activity in recognition. The “active vision school” claims that active 
hardware motions and multiple observations allow for the solution of otherwise ill-defined 
problems in vision [2]. Active motions do not always need to be planned in order to 
benefit from fusing information from different images. Sometimes, “planning” consists in 
quite straightforward prescriptions similar to the following: “In order to solve the stereo 
problem for largely different views, move first only a little and obtain a coarse 25-D de- 
scription of the scene. Then move to the final viewing position and use the additional 
knowledge about the coarse 25-D structure to solve the stereo problem with images from 
largely different view-points”. This “plan” can be translated to a hard-coded algorithm 
without any need to let the system perform any further indiviual planning steps. Thus, 
active vision stresses the information fusion point of view but does not necessarily also 
stress the planning of the next action. 


For active fusion approaches, on the other hand, combining and selecting information 
sources are equally important. Hence, the notions “active vision” and “active fusion” 
carry slightly different meanings. Active fusion always implies a planning step that is 
performed by the system in order to dynamically select the next action leading to a new 
observation. 


Active steps in object recognition will lead to striking improvements if the object 
database contains objects that share similar views or similar internal representations of 
views. A problem that is abundant not only when we have large object databases but 
also because of imperfect segmentation processes. The key process to disambiguate such 
objects is a movement of the camera to a new viewpoint from which the two objects 


2 
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appear very distinct. Taking the perspective of active fusion we will stress the effective- 
ness of planning the next motion. The faster the system can decide between competing 
object-pose hypotheses the more effective our active fusion algorithm will be. 


Classification 
Registration 


Figure 3.1: Flow-chart of an active fusion module. 


Figure 3.1 depicts a flow-chart of procedures necessary for active fusion in object 
recognition. A user query triggers the first action. In our setting the action consists of 
movements of the setup, followed by image acquisition. Other “hard-actions” such as 
changes in illumination or “soft-actions” such as tuning of parameters for image process- 
ing modules may be used in more general systems. Once an action has been performed the 
result of the action has to be evaluated. Thus, processing the image data is the next step. 
Processing will always consist of some sort of feature extraction and grouping, see also 
table 2.2. In the eigenspace example feature extraction consists of segmenting the object 
from the background and projecting the resulting normalized image into the eigenspace. 
The coordinates of the obtained point in eigenspace are the features to be put into a 
classification module. For active object recognition applications this step has to deliver a 
set of object hypotheses and a set of corresponding pose estimations. The next thing to 
do is to fuse the results obtained from the most recent action with the results of previous 
actions. 


In order to do so, it will usually be necessary to interpose a registration step for these 
information sources. Registration amounts to finding correspondences between features 
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encountered in different images. As mentioned before we need no registration step in 
our object recognition example because all encountered features are assumed to originate 
from the one fixed object under investigation. 


Results stemming from different observations have to be fused. Before fusion operators 
are applied all classification results corresponding to pose and other physical parameters 
of the system must be transformed according to the change in physical parameters in- 
duced by the actions taken so far. Only after all classification results have been brought 
to a common frame of reference the fusion step can be performed. 


After the fusion step is completed the next action has to be planned, i.e. a decision has 
to be made on the most promising next action. In object recognition tasks the current 
pose estimation is a decisive ingredient for this part of the active fusion module. Given 
an estimated pose we will perform “virtual” movements of the setup or other “virtual” 
actions and evaluate their expected utility. The most beneficial action will be ranked first. 
To estimate the utility of an action it will be necessary to quantify the expected decreases 
in non-specificity and numerical imprecision when performing the action. In the most 
general setup the cost of performing an action may as well influence the ranking. 


It may happen at this stage that there are no more reasonable actions left. This is 
an indication for unsuccessful termination. Other criteria for termination will be checked 
during the termination step. To this end the system evaluates the ambiguity present in 
the distribution of confidences for all possible hypotheses. According to the result of this 
evaluation another action will be performed or the final result will be returned. Measuring 
ambiguity is thus important both for view-planning and for deciding upon termination. 


3.2. Related Research in Active Object Recognition 


The general idea of active view-point selection has already been applied to object recog- 
nition by many other researchers; see for example [129] for a survey of sensor planning 
and active vision in computer vision. Among those whose work is more closely related to 
ours we find Hutchinson and Kak [65], Callari and Ferrie [35], Sipe and Casasent [124] 
and Kovacié, Leonardis, and Pernus [72]. All of them pursue a similar aim: given a digital 
image of an object, the objective of the system is to actively determine the identity of that 
object and to estimate its pose in the scene. The identity or pose of the object may be 
ambiguous from a given view. The algorithms include mechanisms to detect ambiguities 
and to select viewpoints which resolve these ambiguities. 


Hutchinson and Kak describe an active object recognition system based on Dempster- 
Shafer belief accumulation. We will discuss some details of their implementation after 
having presented our evidence theoretic approach. The system of Hutchinson and Kak 
performs those actions which minimize a newly defined measure of ambiguity. They use 
various actively controlled sensors such as a range-finder and a CCD-camera. The exper- 
iments are performed in a blocks-world environment with very simple objects. 
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Callarie and Ferrie base their active object recognition system on model-based shape, 
pose and position reconstructions from range data. They estimate Bayesian probabilities 
for the object hypotheses and choose those steps that minimize the expected ambiguity 
measured by Shannon entropy. 


Sipe and Casasent describe a system that also uses an eigenspace representation. They 
perform probabilistic reasoning for the object hypotheses but do not fuse the pose esti- 
mations. In order to derive their active vision scheme they assume the variances of the 
likelihood distributions to be constant for each object. This is a strong assumption that 
neglects the variability of the actual data (see also [26]). For planning they suggest to 
learn for each pair of objects the most discriminating viewpoint and they always perform 
the action that places the camera to the position indicated by the two objects that obtain 
highest confidences at the current step. Thus it is not necessarily the overall uncertainty 
of all object hypotheses which is minimized in their approach. The introduction of more 
sophisticated goals - such as, for example, recognition with minimal motions of the cam- 
era, or recognition with highly precise pose estimations - is not considered. 


Kovacic, Leonardis, and Pernus cluster similar views in feature space using a crisp 
clustering algorithm. The system calculates the resulting new clusters for each possible 
action and chooses that action which maximally separates views originally belonging to 
the same cluster. Doing this off-line for all obtained crisp clusters they compile a com- 
plete recognition-pose-identification plan, a tree-like structure which encodes the best 
next view relative to the current one. This tree is traversed during active runs. Their ap- 
proach amounts in fact to a multi-step lookahead planning procedure with a pre-compiled 
plan of action sequences. The plan contains every possible situation that may arise and 
the corresponding action to be taken. The “philosophy” of the approach is thus quite 
different from our active recognition loop depicted in Fig. 3.1 and discussed in section 3.1. 
While our point of view stresses the dynamic aspect of active fusion the work of Kovaéié, 
Leonardis, and PernuS results in a static scenario in which only the hardware performs 
(pre-determined) active steps. As long as both approaches are deterministic! the two 
paradigms for active vision (dynamic reactions vs. static plans) are in fact equivalent on 
a higher level of abstraction. At least in principle it is possible to establish pre-compiled 
action plans for every deterministic system. We have refrained from doing so for our 
single-step, look-ahead system because a full enumeration of possible situations works 
best for recognition algorithms that deliver hard decisions. It is not obvious how to in- 
corporate mechanisms which are able to deal with the exponential increase in possibilities 
when soft decisions are allowed, without substantially altering the algorithm of Kovaéié, 
Leonardis, and Pernus. 


Previous work has also been reported in planning sensing strategies. Murase and Na- 
yar [87] have presented an approach for illumination planning in object recognition by 
searching for regions in eigenspace where object-manifolds are best separated. A con- 


‘Which is the case for our work and the work of Kovaéié, Leonardis, and Pernus. 
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ceptually similar but methodically more sophisticated strategy for view-planning will be 
presented below. Murase and Nayar have limited their approach to an off-line planning 
phase using no active steps. 


Finally, we would like to mention the work of Lucas Paletta who is approaching the 
active fusion problem through machine learning algorithms [97, 96]. The employed rein- 
forcement learning algorithms require an extensive off-line learning phase during which 
the system automatically constructs a mapping from the current state to the best next 
action. The learning phase returns an exhaustive action plan (encoded through neu- 
ral network architectures) which determines for each possible situation the most likely 
best next action. This aspect of Paletta’s work resembles the active vision algorithm of 
Kovacic, Leonardis, and Pernus. But while the latter give an explicit description how to 
construct an action plan the reinforcement approach chosen by Paletta is able to extract 
such a plan automatically. Furthermore, the representation chosen by Paletta allows for 
soft decisions instead of hard classifications. Two implementations are described in [96]: 


e In the first implementation the current state is represented by cummulative con- 
fidence values for the object-pose hypotheses. A multi-layer-perceptron is used to 
model the mapping from the current state to the next action. 


e In a second implementation the current state is represented by the encountered 
feature values (gj, ..,8n) and a radial basis function network is used to map states 
to actions. 


While a lot of computation time is needed to construct the action plans (i.e. the neural 
nets) the complexity of test runs is determined solely by the number of computational 
steps needed for retrieving the best action from the neural networks (“remembering what 
has been learned”). As no dynamic planning is performed these approaches usually out- 
perform our action selection algorithm as far as speed is regarded. In section 3.4 we will 
compare the complexity of these algorithms with the complexity of the view-planning 
algorithm proposed in this thesis. 


3.3 Active Object Recognition 


In this section we begin the actual description of our active fusion algorithm. To fix ideas 
one may assume the use of fuzzy aggregation operators in the following presentation but 
the overall description will be general enough to provide also a common framework for the 
use of different uncertainty calculi in active fusion. Particular details for these uncertainty 
calculi will be discussed in chapter 8 and during the presentation of the actual experiments. 


We denote the values of the hardware parameters of our system before step n by Wy. 
They may include location and pose of the camera, light conditions etc. We will refer 
to w, as a position parameter in the following. An active step can be parameterized by 
coordinate changes Ay,,,;. Every motion of the hardware induces a transformation of 
parameters 


Un = Tadnii Vn (3.1) 
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which is known to the system. In our experiments we will study only rotations around the 
z-axis. This should not indicate any restriction of the presented active fusion algorithm 
as the same approach can be used for two or more degrees of freedom [26]. In our case 
the parameter wy), will measure the angular position of the camera while Ay,, denotes the 
angle at the center of the traversed sector. 


The object’s pose will be denoted by the variable y. By pose we understand any 
parameters describing geometrical properties of the object including the location of the 
object. Estimating the pose makes only sense in conjunction with an object hypothesis. 
For different object hypotheses the related most likely pose estimation will in general also 
be different. The pose y will be measured with respect to an object-centered coordinate 
system while the system’s coordinates w~ are usually measured in a hardware centered 
coordinate system’. In our case the pose variable y is an angular variable and indicates 
rotations of the object around the object’s z-axis which coincides with the system’s z-axis. 
The situation is depicted schematically in figure 3.2 where a top-view of the hardware 
setup is indicated (the setup will be described in more detail before the discussion of the 
experiments in chapter 9). In the considered example the object lies on a turn-table and 
the camera is positioned initially at y, = 0° capturing an image of the object at pose 
yp = 20°. At step n the camera moves from w,, to Wp41. In general the operator Tay,,,, 
will describe any movement in 3D-space and the parameters w, and Aw,41 will include 
location and pose degrees. Since in our experiments only rotations around the z-axis need 
to be described eq. (3.1) reduces to 


Vn+t = AVnti + Vn- (3.2) 
For convenience we shall also write 
T™ := Tay, 0-0 Tay. (3.3) 
Thus w, = T,. In our experiments we simply have 


Um = An +. + AV + d- (3.4) 


3.3.1 View Classification and Pose Estimation 


At position w, an image I, is captured. The image is processed and a set of features 
Zn = n(n) is extracted. Processing of the images consists in segmentation, re-scaling 
and projection into the eigenspace thereby obtaining the feature vector gp. 


Given the input image J, and the hardware parameters w,, we want the object recog- 
nition system to deliver some classification results for the object hypotheses and corre- 
sponding pose estimations. Details of this procedure will be described in chapter 8. Here 


?The parameters used in the eigenspace representation may actually consist of any combination of 
hardware and pose variables. For the sake of clarity we assume in the following that the eigenspace 
coordinates are related only to the object centered pose variable ¢. 

3In general the transformation from object centered coordinates to hardware centered coordinates will 
be more involved as different axes do not have to coincide. 

4In fact we also have Ay, = 0° and we will always start with ~, = 0°. 
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Figure 3.2: Schematic top-view of the hardware setup used in the experiments. An object is 
placed on a round table and three different camera positions 1, Wp, Wn41 are indi- 
cated. First the camera is positioned at w, = 0°. The object’s pose y is measured 
in an object centered coordinate system while the hardware coordinates y are mea- 
sured in a hardware centered coordinate system. The object’s pose as seen from 
yw, = 0° is y = 20°. From the view-point w,, = 70° the object’s pose is y = 90°. At 
step n the camera moves from q, to Wn+41 and traverses an angular sector Awy4+1. 


we denote the numerical values that quantify the system’s confidence in the object hy- 
potheses by c(o;|J,, Wn) and the pose estimation by c(¢;|0;, In, Wn) where the hat on ¢; is 
used to indicate that this is the pose-estimation delivered by the system when shooting 
the image [,, from position w,, (see also the discussion of eq. (3.7) below). The notation is 
clearly inspired by the conventional notation of conditional probabilities, but in fact the 
c(...)’s will stand for probabilities, possibilities, fuzzy confidences and evidence-masses. 
Even though the interpretation of these quantities is radically different for the considered 
approaches they can be presented in a formally similar manner. To fix nomenclature we 
denote the confidence for the combined object-pose hypothesis (0;,;) by c(0i, 6;|Ln, Un) 
and assume that it can be related to the confidences c(0;|In, Wn) and c(p;|0;, In, Wn). This 
will be possible through equations of the following type 


c(0;, al dins Un) = c(%; eye dos Wn) \ Childs Wn) (3.5) 
and 
c(o%lIns Vn) =\f (0, Py lTns Pn) (3.6) 
Pj 


where / is an appropriate conjunctive operation while V is disjunctive. In the probabilis- 
tic case A and V are given by the usual multiplication and summation operators. For 
the other uncertainty calculi we will give the detailed expressions below. The meaning of 
the first equation is that the confidence c(o;, 0;|In, Yn) for the compound hypothesis for 
object and pose (0;,(2;) is given by the confidence c(p;|0;, In, Wn) for the pose hypothesis 
combined conjunctively with the confidence for the object hypothesis. Thus the implicit 
“and” operation in the confidence for the object and pose hypothesis c(0;,;|In, Wn) at 
the left side of eq. (3.5) is translated to an explicit “and” (A) at the right side of eq. (3.5). 
The second equation expresses that the confidence for the object hypothesis alone can be 
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obtained from the knowledge of all compound hypotheses by an appropriate disjunctive 
operation. Since the object must assume one of its poses the confidence for the object 
hypothesis alone should be at least as strong as the maximum confidence obtained for the 
more restricted combined object-pose hypotheses. Therefore we use a disjunctive operator 
in eq. (3.6) 


The feature vectors g, are determined by the input images [,,. We will therefore also 
use the following notation c(o;i|gn, Wn) = cloilln, Yn), c(Pjl0i, En, Yn) = c(P;l0i, In, Pn) 
and c(0;, Gj|Sn, Yn) = coi, Oj |In, Wn)- 


Due to the performed motions the pose estimation c(~;|0;,[n, Wn) at step n does 
not assume the same reference system of coordinates as the pose estimation at step 1. 
This is exemplified in figure 3.2 where the object’s pose as seen from w,; = 0° is 20°, 
hence c($;|0;,[n, Yn) Should assume its maximum for ¢; = 20°. From the view-point 
Wn = 70° the object’s pose is 90°. Assuming a good pose estimation both confidences 
(Pp; = 20°Jo;,.4,¢1 = 0°) and c(g; = 90°|o;, In, Yn = 70°) will be high while other poses 
will receive lower confidences. It is clear that the above two confidence values belong 
together in the sense that both are favoring the same physical pose of the object which 
has to be evaluated in a fixed frame of reference. We take that frame of reference to be 
equal to the frame used for the object centered coordinates. Hence the confidences for the 
physical pose y (now without hat because it refers always to the same frame of reference) 
are given by 


c(y;|0i, In; Pn) = (CT B04; Ins Yn) = (Gj; + Ynloi, Ins Vn) (3.7) 


where the right hand side is already the simpler expression that is obtained in our setting. 
To simplify notation we will no longer indicate the dependence on w, and write only 
c(y;|0;, In) instead of c(y;|0;, In, Yn) in the following. 


3.3.2 Information Integration 


The currently obtained confidences c(o;|In), ¢(Y;|0;, Jn) and c(o;, y;|,,) are used to update 
the total confidences c(o;|fh,.., In), c(y;|0i, L, .., In) and c(o;, p;|h, .., In). 


a CorW Eemy be = File(o,|L1, --; In—1), €(0:|Ln)]. (3.8) 
C5 |0;, Li, +5 Ln) = Fele(yy|oi, Li, --) In—1), €(Y 5104, Ln))- (3.9) 
ClO Daltigd cla) = Foal C Ons lig dni) COs Cs En): (3.10) 


Other terms could be involved in the above combination schemes but we will focus only 
on combinations of the obtained confidences. The fusion operators Fy, F, and Fo, will 
usually be conjunctive in order to stress the importance of decisive observations (see the 
discussions below). Pose estimations are always updated for the physical pose as mea- 
sured with respect to the fixed setup’s coordinate system. 
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In case the operators are not associative we obtain different results when fusing all the 
individually obtained classifications at each step. 


(0511, .-; In) => O3| Fy) 5-245: Cl0;| Ly). (3.11) 


( 
(0; | 04, 1, +) dn) — [ely 075,41) 4 «C63 | 0%; Ta). (3.12) 
GP Leda) = Firle(oi, PylL1), «+1 C(O, Py lfm) (3.13) 


Fle 
Fo 


Now the fusion operators carry a super-script because at the n‘” step they will have n 
arguments. The two approaches and the necessary pose transformations are schemati- 
cally depicted in figures 3.3 and 3.4. In general, any combination of the above equations 
plus eq.s (3.5) and (3.6) may be applied to obtain the total confidences. We will perform 
our experiments only with the recursive updating of eq.s (3.8, 3.9,3.10) as this does not 
require on-line storage of all the intermediate results. 
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Figure 3.3: Fusing the pose estimations by integrating the new confidences with the already 
integrated confidences from previous steps. See eq.(3.9). 


3.3.3 Action Planning 


Since active fusion has been defined to be an active extension of information fusion that 
is concerned not only with fusion but also with the selection of information sources the 
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Figure 3.4: Fusing the pose estimations by integrating at each step all the confidences encoun- 
tered so far. See eq.(3.12). 


action planning module is one of the two major modules in our algorithm®. For action 
planning we suggest to rely on a score s,(Avw) attributed to each possible motion Aw of 
the system. The action obtaining highest score will be selected next 


Adn+i = arg se Sn(Av) (3.14) 
The score measures the expected utility of action Aw. The score is thus given by 


Sn(Aw) = wy clO;, 5 |Ji, ery U(Ay|oi, 93, hi, pln): (3.15) 


oO PI 


The term U(Aw|o;, 9;, ,..J,) measures the utility of action Ay in case 0;,.p; were the 
correct object and pose hypotheses. The utility of an envisaged action is weighted by the 
confidence the system has in that particular object pose hypothesis during the calculation 
of the score s,(Ay). 


The utility of an action depends on various factors such as the cost C'(Aw) of action 
Aw. In most applications actions have to be considered more useful if they will lead to 
less ambiguous and more specific object hypotheses. Sometimes it will also be necessary 
to obtain very precise pose estimations. Actions which will lead to an increased accuracy 
of the pose estimate should then be considered to be useful. 


We will discuss the issue how to measure the non-specificity of object hypotheses and 
the numerical imprecision of pose estimations in detail in chapters 6 and 7. For the 
moment let us assume that we have such measures at our disposal. To give concrete 
examples: In probability theory the non-specificity of the object hypotheses c(o;|g1, .., Bn) 


©The other one being the fusion module. 
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will be measured by Shannon entropy (see also eq. (8.8)). Other entropy-like measures 
will be used for other uncertainty calculi. For the following discussion let us denote the 
chosen measure of non-specificity by H(o|gi,..,g,). It is a measure of how ambiguous the 
object hypotheses are. The accuracy of pose estimations c(~;|0;, 81, .., En) is quite often 
measured through the standard deviation or the variance of the distribution®, (see also 
eq. (8.11)). Again let us only assume that we have a measure of numerical imprecision 
I(y|0;, 81, --; $n) at our disposal which can tell us whether the pose estimation is still very 
inaccurate or not. 


Given these definitions, the utility of an action can be considered to be a function 
of the cost of an action, lowest expected decrease in non-specificity, and lowest expected 
decrease in numerical imprecision. 


The lowest expected reduction in non-specificity is given by 


AH (Av |o;, Pj» hh, ey I) = 
A (olgi, --, Sn) — max H (ol, ... 8n, 8°) (3.16) 


In eq.(3.16) the feature vectors g*,s = 1..N, correspond to characteristic samples of fea- 
tures which can be expected for object 0; at pose Tay, oT My; = Av+ynt+y;. The quanti- 
tiy Aw+uU,+,; denotes the pose which is expected to be observed in case the original phys- 
ical pose was ;, the system has already moved to position ~,,, and is now performing step 
Aw. We may make the dependence of the sample vectors g* on the object 0; and the posi- 
tion T,,oT™ y; more explicit by writing g° = g°(0;, TayoT™ y;) = g°(0;, AW +n +4;). 


In our experiments the samples g* have been generated from the learned likelihoods 
p(glo;, p;). Since we are using an appearance based approach to object recognition these 
feature values have already been learned during the off-line training phase of the object- 
pose classifier. In general, the expected feature values have to be learned from examples 
or they must be generated from stored object models. 


Note that the term H(o|gi,..,n,g*) on the right hand side of eq. (3.16) implies a 
complete tentative fusion step performed with the hypothetically obtained features g* at 
position Tay, o TM y,. 


In case we have estimated the likelihoods p(g*|o0;, T ApoT™y;) we can replace the cau- 
tious maximum operation applied in eq. (3.16) by an average to compute an expectation 
value: 


AH (Ay |o;, Pj» ih, tag Te) = 


HA (olgi, ..,8n) — i: P(gloi, Tay oT” y;) Holgi, .., Sn, g)dg (3.17) 
Vv 


°Tn chapter 7 we will actually find out that this is surely the most common but also definitely not the 
best measure of numerical imprecision if the active vision algorithm is based on probability theory. 
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where the integral runs in principle over the whole feature space V but is approximated 
again by taking samples g*. Once the set of samples g*(0;, Tay 9° TM ,) has been defined 
the likelihoods and the confidences c(o;,y;|g,) which are needed for the evaluation of 
H(olgi, .., Zn, g°) can be computed off-line. When comparing eq. (3.16) with eq. (3.17) it 
is obvious that the latter one will be more accurate in estimating the expected decrease 
in non-specificity whenever the likelihoods can be estimated accurately. In fact, eq. (3.17) 
is per definition nothing but the expected decrease in non-specificity. On the other hand, 
it will not always be possible to obtain the required likelihoods in eq. (3.17). It should, 
however, always be possible to obtain at least some samples of the feature vectors g® 
for each pose of the object in an off-line learning phase. Then the expected decrease in 
non-specificity might be approximated by some sort of averaging. Instead of averaging 
we suggest in eq. (3.16) to stay on the “save side” and to take the minimum expected 
decrease in non-specificity as estimated using a few representative samples g®. 


In complete analogy we define the minimum expected decrease in numerical impreci- 
sion (i.e. the minimum of the expected increase in accuracy) by 


Al Apo; }; Ly 5 ta) = 
F(gloi, 81, -- Sn) — max I(y|0;, Bi, --, Bn, 8") (3.18) 
or 
NI Adlon Oy Tiss, Jn) = 
I plei8is-8n) — f rlele Fay oT) T(ylo;, 21, -.8n,g)dg. (3.19) 


Having established these quantities the expected utility of action Aw given that the 
correct object pose hypothesis is 0;, ; and the images J,,.., [, have been observed can be 
obtained through 


U(Av|oi,9;, 11, ..,In) = f(low(C), high(AH), high(AD) (3.20) 


The fuzzy sets low and high are defined on the universes of possible values for cost, de- 
crease in non-specificity and decrease in numerical imprecision. The fusion function f has 
to be conjunctive if all the criteria should be satisfied, averaging if a compromise between 
the criteria is wanted and disjunctive if satisfaction of any of the criteria should suffice. 
For the sake of readability we have suppressed the arguments of C, AH and AJ on the 
right side of eq. (3.20). 


In order to avoid capturing images from the same viewing position over and over again 


the score as calculated through eq. (3.15) is multiplied by a mask that disfavors recently 
visited positions. Thus the score is transformed before eq. (3.14) is applied: 


Sn(Aw) — (sa(aw) _ min so(A0)] mask( Aw) (3.21) 


with mask(Aw) being low for recently visited viewing positions and the score being trans- 
lated to purely positive values. Especially mask(Aw % 0°) should be kept low. This 
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transformation is one way to force the system to take the second best alternative in case 
the algorithm would decide not to move at all. 


Finally, the process terminates if non-specificity and numerical imprecision become 
lower than pre-specified thresholds or if no more reasonable actions can be found (maxi- 
mum score too low). 


3.4 The Complexity of View Planning 


We conclude the introduction of the active fusion algorithm with a discussion of its com- 
plexity. In the following we denote by n, the number of objects, by n, the number of 
possible discrete manifold parameters (total number of views) and by ny the number of 
degrees of freedom of the setup. Since n, depends exponentially on the number of degrees 
of freedom we introduce n,, the mean number of views per degree of freedom, such that 
ig = ny’. Finally let us denote by ng the average number of possible actions. If all 
movements are allowed we will usually have ng = ny. 


Before starting the discussion of the complexity of action planning it is important to 
realize that many of the intermediate results which are necessary during planning can be 
computed off-line. In eq. (3.16) the quantity H(o;|g1,..,2n,g) is evaluated for a set of 
sample vectors g = 1, .., Sn, for each of the possible manifold parameters y; + W, + Aw. 
We denote by n, the number of samples per view-point used for action planning. The 
corresponding likelihoods p(g,|0;, 9; + UW, + Aw) and confidences c(o;|,),r = 1..n, are 
computed off-line such that only the fusion step in eq. (3.8) has to be performed on-line 
before computing non-specificity. Hence the complexity of calculating the score for a par- 
ticular action Ay is of order O(nony Ns No). 


On the other hand, the complexity of calculating the score values for all possible 
actions is only of order 
Olnigis Re Not Nong Ne) (3.22) 


if a lookup table is calculated on-line. The first term nongn no expresses the order of com- 
plexity of calculating the fused probabilities (and the corresponding average entropies) for 
all the non,ns possible samples that are used as potential feature vectors for view plan- 
ning (n, per view with non, being the total number of views). These average entropies 
can be stored in a lookup table and accessed during the calculation of the total average 
entropy reduction. Thus we need only nony nq additional operations to compute all the 
scores s,(Aw) through eq.s (3.15) and (3.16). 


We can also take advantage of the fact that finally only hypotheses with large enough 
confidences contribute to action planning. This is due to eq. (3.15) which states that 
hypotheses with low confidences do not affect the calculation of the score. Hence only the 
n, most likely compound hypotheses (0;, y;) may be taken into account. The number 1 is 
either pre-specified or dynamically computed by disregarding hypotheses with confidences 
below a certain threshold. Usually ny << non,, for example n; = 10 (taking n; = 2 
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imitates the suggestion presented by Sipe and Casasent [124]). With this simplification 
we obtain the following estimate for the order of the complexity of the algorithm. 


O(n2ngns+ nina) « O(ni!(n2n, +1). (3.23) 


This can be lowered again if not all possible actions are taken into account (nq < ny). 
The above estimates explain why the algorithm can run in real-time for many conceivable 
situations even though the algorithm scales exponentially with the number of degrees of 
freedom. In fact, since the contributions of each sample and each action can be computed 
in parallel a great potential for sophisticated real-time applications exists. In the exper- 
iments to be described in chapter 9 a typical view planning step takes only about one 
second on a Silicon Graphics Indy workstation even though the code has been optimized 
towards generality rather than speed and none of the mentioned simplifications has been 
used. 


We expect that the introduction of simplifications into the algorithm will not nec- 
essarily lead to large decreases in overall performance. Instead subsequent active steps 
will usually correct for the few mistakes that may have arisen during the action-planning 
phase. If, however, the goal is to truly optimize a certain behavior (fewer steps, less costs 
etc.) then the above algorithm can also be used without undue simplifications. 


The given estimates for the complexity of view-planning can be related to the complex- 
ity of action selection in the reinforcement learning approaches which have been advocated 
by Lucas Paletta [97, 96]. The two algorithms described by Paletta (which have shortly 
been reviewed in section 3.2) are of complexity 


O(NoNgNnn) and O(npna) 


resepectively. Here nz, denotes the number of hidden units in the approach using a multi- 
layer-perceptron and ny, denotes the number of basis functions in the implementation using 
a radial basis function network. Both np, and nz are functions of the number of objects 
n, and the number of poses n, because the networks need to scale with the complexity 
of the model-database. In the considered experiments Paletta uses n, = 16,n, = 3 x 12 
(two degrees of freedom),ny = 100 and ng = 5,n, = 12,n, = 5 while we will be using 
No = 8..14,n, = 72, ns = 20 in chapter 9. Hence, for small sized problems the complexity 
of action selection in the reinforcement scheme is lower than the complexity of action 
planning according to the proposed approach. It may well be that this statement is 
valid also for problems of larger size’. In any case, the above approach makes action 
selection very fast. The major problem of the reinforcement technique is the time needed 
for learning the best actions. This time increases dramatically as the number of possible 
actions nN, increases. Finally, the approach presented in this thesis, though being less fast 
than the reinforcement scheme, allows nevertheless for action planning in real-time in the 
considered experiments. It may be arguable if additional speed in action planning is of 
any practical value since most of the time between two observations is actually spent by 
physical motions of the hardware. 


“Exact conclusions cannot be drawn because the necessary minimum values of nz, and np are not 
known before-hand. 


3.5. DISCUSSION AND OUTLOOK Al 


3.5 Discussion and Outlook 


After having reviewed object recognition and active fusion we have been able to give a 
concrete outline of a prototypical active fusion algorithm. Indeed, the presentation in the 
previous section 3.3 constitutes a much more detailed description than the general con- 
siderations of sections 3.1 and 1.3. Hence, we have finally reached a state that demands 
a first discussion and outlook. 


The presentation so far has been sometimes very specific. For instance, we have def- 
initely based our discussion on the eigenspace approach in sections 2.2 and 2.3. For the 
eigenspace approach we can even give an approximate geometric interpretation of the 
view planning process: Depending on the presently obtained non-specificity of object hy- 
potheses, views are chosen for which the expected feature vector in the eigenspace falls in 
a region where the manifold representations corresponding to the ambiguous object hy- 
potheses are well separated. Well separated implies that the probability distributions (or 
samples) of the corresponding feature values for these objects have only a small overlap 
in these regions. In terms of a distance metric such regions correspond to areas where the 
Mahalanobis distance between the manifold representations is large. 


On the other hand, the given discussion has in various respects still been very general 
in nature, leaving important gaps which have to be filled before any experiments can be 
performed. Much of the rest of the thesis will be devoted to this task. Before doing so, 
however, let us draw some conclusions from the fact that it has been possible to formulate 
the active vision algorithm in such a general form. 


Note for example, that in the description of the algorithm no reference has been made 
to the details of the object recognition algorithm. In fact, the eigenspace method has 
not been used apart from making the general arguments more plastic. The features g(/) 
which are derived from the image J may be of a much more complicated form than they 
will be in the actual implementation based upon the eigenspace representation. They 
may stand for other unary features like moments, but they may as well include relational 
structures. The only thing which is important for the formulation of the algorithm is the 
existence of modules which allow for the computation of confidences c(o0;, ;|g(J)) and the 
possibility to obtain some samples of expected feature vectors g® for all possible viewing 
positions (see eq. (3.16) and its siblings). 


We may even go beyond the paradigm of appearance based object recognition ap- 
proaches. The presented active vision algorithm does not at all depend explicitly on the 
use of appearance based schemes. Using an object-centered representation the expected 
feature values at the target view-point have to be inferred from the current object, pose 
hypothesis plus object-centered additional knowledge (such as e.g. a CAD-model of the 
object). Generating the expected feature values g* from object-centered models is equiv- 
alent to an on-line generation of aspects (or their features) of the hypothesized object (as 
has been done in [102]). As long as this can be done, the above active fusion algorithm 
can also be applied if object-centered representations are used. In fact a mixed represen- 
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tation seems to suggest itself in view of the need to obtain some expected feature values g*. 


Finally, and this may perhaps sound astonishing, the algorithm is unaffected even if 
we perform generic object recognition. No changes in the planning step outlined in section 
3.3.3 have to be made as long as we can assume that it is possible to estimate some ex- 
pected feature values g* from the knowledge of the object class to which a specific object 
may belong. It may well happen that large variances in feature values can be expected for 
a generic object recognition system. This will lead to less clear decisions on what actions 
will be best. Still, the given algorithm provides a sound description on how to arrive at 
a decision even for generic object recognition schemes. 


Since such a universally applicable algorithm has been found it seems that we have 
either achieved a major breakthrough or we have formulated a triviality. The truth may 
lie somewhere in between: We have indeed found a formal representation for the tasks 
of active fusion that goes beyond general outlines and guide-lines. On the other hand, 
the complete algorithm - when stated in plain English - reduces to a very simple and 
intuitive prescription: “Choose that action which - based upon the current state and 
your background knowledge - can be expected to lead to the greatest improvements of 
the recognition results. Having obtained new results integrate them with your previous 
results and repeat the whole cycle until the results are satisfactory.” 


What has been achieved is that the linguistic terms contained in the above prescription 
have found algorithmic analogies in section 3.3: 


e Current State: In our approach the current state of the recognition chain is com- 
pletely determined by the current object pose hypotheses c(..). This indicates also 
two of the hard constraints for the algorithm: First, the algorithm does not deal 
with improvements during the recognition process itself (For example by performing 
active steps during feature extraction or matching). Instead improvements will be 
achieved only through complete and successful recognition results during some of 
the active fusion steps. We therefore need an object recognition system which is 
able to reliably extract object confidences from static scenes at least in some cases. 
Second: We are dealing with an algorithm for single object recognition. The current 
state involves no representation for multiple objects or complete scene descriptions. 


e Background Knowledge: For action planning some knowledge of what features to 
expect is necessary. Thus the action planning module needs some sample features g*. 
Using appearance based representations this is easy to achieve’. In general this part 
of the algorithm can be quite difficult. For instance, it would certainly amount to a 
formidable task to create an active system for generic object recognition and to find 
a mechanism to estimate the expected feature values g® from the knowledge of the 
object’s class. However, by doing so we would learn much more about generic object 
recognition than about active control mechanisms because we could still apply the 


8Which, after all, is one of the reasons why we are using an appearance based approach. 
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presented algorithm for active fusion®. Another aspect related to the background 
knowledge needed for action planning is that we can see again the limitation of 
the current algorithm to single object scenes. The background knowledge does not 
include any knowledge on how different objects would occlude each other in multiple 
object scenes. Self occlusion is assumed to be captured by the samples g* which is 
automatically true for appearance based representations while systems using object 
centered representations need some mechanism to account for self occlusion. But 
occlusion caused by one object being located closer to the camera than the other 
can not even be formulated with the tools of the given algorithm. Nevertheless such 
effects must of course be taken into account when planning actions for multi object 
scenes since the most discriminative view might simply not be visible because of 
inter-object occlusion. 


Choose the Best Action: We have formulated this choice in terms of an optimiza- 
tion problem (see eq. (3.14)). Our solution to solving that optimization problem is 
actually quite primitive: We assume only a limited number of possible actions and 
evaluate the expected utility for each action. The above complexity analysis sug- 
gests that for single object recognition an exhaustive search may be quite sufficient 
but it is actually a shortcoming of the given presentation that no heuristics con- 
cerning action selection are included. For example, in order to find the maximum of 
the score s(Ay) one may use the gradient of s(Aw) for a local maximization algo- 
rithm. The gradient of the score can only be calculated if the applied measures of 
imprecision are differentiable. We will therefore also mention differentiability when 
reviewing various measures of imprecision. The maximization of the score can also 
be simplified by using explicit rules which encode knowledge about what actions 
are likely to be effective under the given circumstances. A similar approach, using 
rule-based selection schemes, has been proposed in [29]. 


e Best action means greatest improvements. We have not at all spoken about how to 
measure the quality of the already obtained or expected recognition results. Sure, 
we have succeeded to reduce the calculation of the expected utility of the next ac- 
tion to calculating the amount of non-specificity H(o|g1, .., gn, g*) of a set of object 
hypotheses, the numerical imprecision of the pose estimations /(p|o;, 1, .., Bn, 8°) 
and the cost of a particular action C(Aw). The cost of an action is strongly applica- 
tion dependent. One might for example imagine to attribute higher costs to actions 
involving bigger displacements in order to keep overall energy consumption low. We 
will assume uniform cost for all actions and will therefore not be concerned with 
this term in the following. But the other two terms need to be studied carefully. A 
large part of this thesis will be devoted to theoretical considerations on measures 
of non-specificity and numerical imprecision. Since planning and the evaluation of 
results will form a central part of every active vision algorithm we consider these 
theoretical results to be of great importance even though not all of them will be 
applied in this thesis which is dealing only with a limited application domain. 


°We repeat again that our interest in this work focuses on active fusion as applied to a concrete 
problem and not on object recognition. 
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e Information Integration. While eq.s (3.8)..(3.10) give a correct description of the 
applied fusion process they are void of any meaning unless we specify the actual 
fusion operators F,, Fi, and F,.,. When discussing the different implementations 
of the algorithm using different uncertainty calculi we will see that especially the 
fusion process will be quite different for the probabilistic, the possibilistic, the fuzzy 
logic and the evidence theoretic implementations. 


We are now in a position to clearly state the outline of the remainder of the thesis: 


It will be necessary to give a more detailed description of the information process- 
ing part of the algorithm. Especially the fusion process needs careful examination. This 
will be done by studying different implementations of the presented algorithm in chapter 8. 


It is furthermore absolutely necessary to specify clearly what is meant by “good” 
recognition results. To this end measures of imprecision have to be studied and reviewed. 
We will commence this enterprise with the following chapter 4. Let us repeat again that 
the reader with little interest in theoretical developments may continue in chapter 8 with 
the discussion of the active fusion approach since the relevant pieces from the study on 
measures of imprecision can be reviewed later whenever appropriate. The reader who 
chooses this path may find the notational definitions in section ?? useful when browsing 
through the definitions of non-specificity and numerical imprecision in chapters 6 and 7. 
This should by no means imply that the obtained theoretical insights on measures of im- 
precision are superfluous for the rest of the thesis. Quite on the contrary we will see that 
some results will have an immediate impact on active object recognition. However, we 
consider the real importance of the theoretical developments to be given by their potential 
for the considered active fusion approach as well as for other solutions to perhaps more 
general active vision problems. 


The discussion of our active fusion module will get more concrete in chapter 8 and it 
will finally lead to implemented systems whose performance will be compared experimen- 
tally. It is not clear a-priori which implementation will perform best or which measure 
of imprecision will turn out to be particularly useful. We will therefore test a variety of 
implementations. It goes without saying that the presented study will still be limited 
in extent and application domain. Nevertheless we will be able to draw some important 
conclusions regarding the 


e use of different uncertainty calculi for fuszon and processing of the confidences, 
e and the use of different measures of imprecision. 


Besides the active fusion algorithm itself, these two issues will form the major theme 
for the following chapters. 


Chapter 4 


Defining Imprecision and 
Uncertainty 


Often we wish systems to be able to make decisions on the basis of currently gathered 
imperfect information. For this task it is vital to assess the ‘quality’ of the results ob- 
tained so far and/or the ‘quality’ of the results to be expected. The major application 
we are interested in is active object recognition |30, 27] where the system must be able 
to judge the value of operations aiming to reduce the imprecision associated with object 
and pose hypotheses. Imperfect information in this context arises in various forms. We 
discuss them here to generate a context for the following work. 


Object Recognition. 


The output of the object classification module is a list of object hypotheses. In single 
object scenes these hypotheses are mutually exclusive. Let us assume that each hypoth- 
esis consists of an object label 0; associated with a numerical value c; € [0,1] expressing 
the system confidence in recognizing the particular object when presented the input im- 
age. The system is rather confident in having recognized a particular object if only one 
hypothesis (0%, c,) obtains a relatively high confidence value. On the other end of the 
spectrum the system output may be very imprecise. It may be unclear which object is 
depicted on the image and many (or all conceivable) object hypotheses may receive simi- 
lar (and possibly high) confidence values with no single outstanding hypothesis. We will 
be concerned with the task of assigning numerical estimates measuring the overall degree 
of imperfection of the output of the classification module. In case the estimate behaves 
in the way described before we use the term estimate (or measure) of non-specificity +. 
Estimates of non-specificity will be high whenever many hypotheses seem equally likely 
and minimal if only a single hypothesis receives a high confidence value. 


Note that this estimate does not reflect the actual performance of the classifier. A 
distribution of confidence values may be absolutely specific favoring only one single hy- 
pothesis but the image shows in fact a different object and the classifier is just wrong. It 
goes without saying that we would prefer to measure the actual truth of a set of hypothe- 


'Terminology will be discussed in more detail below. 
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ses. In lack of measures of correctness (we are not dealing with labeled training data) 
we restrict the problem to estimating the amount of non-specificity which the system 
“thinks” is present and try to minimize this quantity’. The underlying assumption here 
is that the classification task performs reasonably well and its internal states correspond 
to some non-negligible extend to actual external realities. We may then hope to get close 
to the correct hypothesis by minimizing non-specificity. 


The problem we have described puts another constraint on estimates of non-specificity: 
Imagine two runs of the system where in each run the list of object hypotheses is identical 
but the confidence values are permuted, such that for example object one receives as much 
confidence in the second run as did object three in the first run etc. Since the overall 
distribution of confidences remains the same and there is no natural way of labeling the 
objects (any permutation is as good as the other) we should assign the same value of 
non-specificity to the object hypotheses in both runs. Estimates of non-specificity will be 
invariant under permutation of the confidence values. 


It may have become clear that in case the confidence values sum to one (for example 
because they stand for probabilities) Shannon entropy is a potential candidate for estimat- 
ing non-specificity in the sense we are using the term®. It should be observed that in our 
context we are estimating the non-specificity of a subjective assignment of confidences. 
Tanslated to probability theory this amounts to the epistemic interpretation of proba- 
bilities in terms of (Bayesian) subjective degrees of beliefs in contrast to the frequentist 
and experimental determination of probabilities by counting occurrences of events. Hence 
estimates of non-specificity are not directly related to measures of information as they are 
defined in information theory. In fact it is rather subjective overall specificity of obtained 
internal confidences that counts in our context. 

Pose Estimation 


Especially for active systems we are not only interested in what object is present but 
also want to know the pose of the object. A pose estimation module delivers confidence 
values c(y) for the pose hypotheses vy (usually for each object separately). To fix ideas 
let us assume y stands for one of the Euler-angles of a rigid object, e.g. the angle that 
describes rotation around the z-axis. Again pose estimation may result in different hy- 
potheses and we will proceed to measure the overall imprecision of a given output of the 
pose estimation task. The situation is very similar to the one described above: The most 
specific pose estimation has only a single favored pose while results with many poses ob- 
taining similar confidence values are less specific. 


However, there is one more element present in the current problem: the distance be- 


?The use of such anthropocentric expressions such as “the system thinks” should help in presenting 
ideas figuratively. Clearly no sort of consciousness of the machine is implied. 

3Here we are avoiding purposively the linkage that exists between Shannon entropy and probability 
theory because from the application driven point of view which is adopted here Shannon entropy may as 
well be used to estimate non-specificity if the confidences are not combined in the way probabilities are 
fused. 
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tween different angles. Imagine the system is sure that only two poses y; and y; are 
possible. It is evident that the system’s state of knowledge has to be considered less 
perfect if the poses are y; = 0° and y; = 90° than it would be in case the two poses 
were given by y; = 0° and y; = 1°. In the first case we are effectively encountering two 
competing hypotheses while in the latter case we may witness only the result of numerical 
imprecision of the performed measurements. Estimates of non-specificity as we have de- 
signed them before assign the same amount of non-specificity to both situations because 
they are invariant to permutations of the confidences attributed to basic events. 


What we need is an estimate of precision that reflects not only whether there are 
just a few hypotheses present but also how far apart or close together these hypotheses 
are located. To have a unique term for such measures we shall call them estimates of 
numerical precision (of physical measurements). Again such estimates already exist in a 
probabilistic context, where it is often simply standard deviation that is being used to 
estimate the numerical precision of physical measurements. 


Classifier Fusion 


In active recognition the system makes usually more than only one observation. Im- 
ages may be captured from different view-points and each processed image results in a 
classification output. These outputs need to be fused in order to obtain a final decision. 
Information fusion operators can display conjunctive (severe), disjunctive (indulgent) and 
compromising (cautious) behavior. Furthermore, the type of behavior of a specific fusion 
operator may change over the range of the input variables and all three categories of 
behavior may be realized in a context independent or context dependent way [24] . For 
the application of context dependent operators - which also depend on knowledge on the 
sources to be fused - it may well become important to have a means of quantifying the 
imprecision associated with the sources. 


Overview. 


In part 4 of the thesis we pursue two aims: First, giving a focused overview of ex- 
isting measures of imprecision, and introducing some new measures that are useful for 
the applications mentioned above. (For broader reviews of specific types of measures of 
imprecision see [93],{94],{92] and [69]). It is understood that the measures defined here 
may also prove useful for distributions of confidence values that arise in other pattern 
recognition tasks which are not related to object recognition. 


Measures of fuzziness, non-specificity and numerical precision are discussed. Measures 
of fuzziness are included mainly to relate our estimates of different types of imprecision to 
already existing measures for fuzzy sets. For all these measures the limit sets (consisting 
of the totally imprecise and absolutely precise sets) are presented and the definition of 
the transition from one extremum to the other is defined. 


The following chapters contain various original contributions. 
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For the first time a geometric approach to measuring various types of imprecision of 
fuzzy sets is emphasized throughout. The geometric approach turns out to be fruitful 
because it facilitates the definition of limit sets as well as the calculation of values of 
measures of imprecision for transitory sets. Indeed, the definition of the amount of im- 
precision for fuzzy sets lying between the limit sets will be based upon their geometrical 
distances to the limit sets. 


In section 5.2 a few measures of fuzziness are defined following geometric considera- 
tions. In sections 6.2.1 and 6.2.2 a constructive, geometrically inspired approach is taken 
to define new measures of non-specificity. In section ?? measures of numerical precision 
are discussed. 


Adopting a geometric viewpoint, we will have to deal with the measurement of dis- 
tances between fuzzy sets to measure uncertainty. Various definitions of distance between 
fuzzy sets exist [21]. We will use mainly 


1/q 


1/q 
d,(F, F’) := (Som - nt) = ||Mp— ep, with [ull = (x) (4.1) 


because of the ease of geometric interpretation and because it is differentiable. It is 
understood that gq € [l,oo). The distance d,(F,F”) runs under the heading functional 
approaches to measuring distances in Bloch’s review [21]. dz is the Euclidean distance in 
Ky while d, is the distance defined through max-norm. It is invariant to permutations 
dg(PF, PF’) = d,(F,F") with PF = (pr(tpay),-. Ur Upm))) and p : {1.4} —> {1..k} 
being a permutation of the integers 1, 2,..,4. Another noticeable property of d, is that it 
takes a constant value if the supports of F and F” are disjoint. For these reasons it will 
not be used for defining measures of numerical precision. 


In technical terms the question to be discussed in the following is: What numbers 
should we assign to the points of Ky, (i.e. to the possible fuzzy sets over U) in order to 
obtain reasonable measures of fuzziness, non-specificity or numerical precision ? In other 
words, we want to define a function H : Ky — R that can be interpreted to be a measure 
H(F) of the overall imprecision present in F’. 


4.1 Terminology. 


Before continuing let us discuss terminology. It is of interest in itself how different authors 
use terms associated with notions of ¢zmprecision and uncertainty. It would go far beyond 
the purpose of this thesis to provide an extensive review of terminological and associated 
philosophical considerations. Instead we aim at providing a few succinct examples to 
relate our terminology to already existing frameworks. We have chosen two very influential 
examples: Dubois and Prade’s work [47], and Klir and Yuan’s categorization [69]. Even 
when we restrict attention to the two mentioned approaches we will see that comparing 
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terminology is not at all easy since the terms are not only used with different (or even 
contradictory) meanings but there are also regions of partial overlap of meaning etc. 
This does not necessarily reflect a fundamental arbitrariness in defining terms related to 
imprecision and uncertainty but may instead indicate that the issue is still subject of 
ongoing debate. In fact, we currently witness a dynamic evolution of concepts, expressing 
itself, for example, through subtle changes in the definition of terms in [68] and [69]. 
We have always taken the most recent references (e.g. [69]) as a basis for the following 
discussion in order to reflect the latest developments. 


4.1.1 Dubois and Prade’s classification. 


The major aspect of their work is to consider imprecision and uncertainty as two comple- 
mentary aspects of a single reality, that of imperfect information. 


The authors consider the following quadruple to be a basic item of information: (at- 
tribute, object, value,confidence). The attribute refers to a function fi, that attaches a 
value (or a set of values) to the object: far(object) = value. The confidence part of the 
quadruple indicates the reliability of the item of information. All four entities may be 
composite for a single basic item. An exemplary basic statement in the object recognition 
setting would be “Given the input image J the object classification result is [0;,0;,..] with 
confidences [c;,c;,..|”*. This statement could be translated into 


Classi fication(I) = [0;, 0;, ..] with confidences lennon e (4.2) 


or (Classification,input image I,[0;,0;,..], |ci,¢;,..|). Clearly this example contains com- 
posite entries for the “value” and “confidence” component. 


Dubois and Prade relate imprecision to the content of an item of information (i.e. 
the “value” component of the quadruple). They call an item of information precise when 
the subset of its “value” component cannot be subdivided. Depending on what aspect of 
information is being emphasized they speak of an elementary proposition, of a singleton, 
or of an elementary event. If the “value” component can still be subdivided they speak 
of imprecise information. 


An elementary proposition is given by a classification result that focuses on one sin- 
gle object hypothesis, e.g. (Classification,input image I,o;,c;). In such a case we can 
assume that all other possible values 0, receive zero confidence cy = 0. Note that this 
corresponds exactly to the situation where the estimate of non-specificity as described in 
the introduction would be lowest, i.e. specificity would be maximal. Our estimates of 
non-specificity are therefore measuring imprecision. The same is true for the estimates of 
numerical imprecision of physical measurements. 


Uncertainty, as understood by Dubois and Prade, relates to the truth of an item of 
information, understood as its conformity to reality (the “confidence” component of the 


4Other examples may be found in [47]. We will use only the active recognition application to illustrate 
concepts. 
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quadruple). Expressions like “probable”, “likely”, “possible”, “necessary”, “plausible” or 
“credible” may serve as qualifiers for uncertainty. 


Having established this framework Dubois and Prade state the following duality: Given 
a set of items of information, the opposition between precision and uncertainty is expressed 
by the fact that in rendering the content of a proposition more precise, one will tend to in- 
crease its uncertainty. For example in a given setting the statement “The object’s pose is 
75.82°” may be very precise while the statement “The object’s pose is somewhere around 
70°” is much less precise. While imprecision is lower in the first case than in the second, 
uncertainty is usually higher in the first case: given that we use the same means in the 
same ways to attach confidences to the above statements the confidence part of the first 
statement will usually be lower than that for the second statement as it requires more 
effort to attach high degrees of confidence to very precise statements. 


Dubois and Prade observe that there are other qualifiers that refer to imprecision such 
as “vague”, “fuzzy” or “ambiguous”. Vagueness or fuzziness in an item of information 
resides in the absence of a clear boundary to the set of values attached to the objects 
it refers to. The value field is then usually represented by a fuzzy set such as “high”, 
“many” etc. The authors speak of ambiguity to the extent an item of information refers 
to several possible contexts or reference sets. That is, the domain and/or the image of 
the function fat,-(object) are not uniquely specified (and consequently the same is true 
for the specific form of the function itself). 


Setting our work in relation to the terminology of Dubois and Prade we may say 
that we try to estimate the amount of different types of zmprecision in basic items of 
information. Non-specificity and numerical imprecision are types of imprecision directly 
related to the above definition of a singleton subset being the most precise or elementary 
event. 


4.1.2 The classification of Klir and Yuan. 


Klir and Yuan [69] recognize a few basic types of uncertainty. First, the notion uncer- 
tainty is subdivided into the two categories of fuzziness (or vagueness) and ambiguity. The 
latter is again separated in two terms: conflict (or discord) and non-specificity. Figure 
4.1 depicts the classification. Note the many synonyms or special instances of the basic 
categorizes that are given by Klir and Yuan. Table 4.1 lists the definitions of the terms. 


Fuzziness Lack of definite or sharp distinctions. 

Ambiguity One-to-many relationships. 
Discord Disagreement in choosing among several alternatives. 
Non-specificity | Two or more alternatives are left unspecified. 


Table 4.1: Variants of Uncertainty according to Klir and Yuan [69]. 
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Uncertainty 
Ambiguity 


Lack of definite or sharp One-to-many relationships 
distinctions | 


© vagueness 
e cloudiness 


e haziness Pan 

e unclearness Nonspecificity 

e indistinctness : 

e sharplessness Two or more alternatives 
are left unspecified 


e imprecision 
e variety 


7 : e generality 
Discord, Conflict e diversity | 
e equivocation 


Disagreement in choosing 
among several alternatives 


e dissonance 


e incongruity 
e discrepancy 


Figure 4.1: Basic types of uncertainty according to Klir & Yuan [69]. 


It appears to us that the term “uncertainty” as used by Klir and Yuan describes 
approximately the fundamental reality that has been termed “imperfect knowledge” by 
Dubois and Prade. The types of “uncertainty” listed by Klir and Yuan certainly do 
not correspond directly to what Dubois and Prade call “uncertain information”. Instead 
they appear to be more closely related to different types of “imprecise information” in 
Dubois and Prade’s terminology. Indeed, the term “uncertainty” as used by Klir and 
Yuan seems to have some overlap with both “imperfect knowledge” and “imprecision” 
as used by Dubois and Prade. Since the latter use “uncertainty” in a very specific sense 
(related to the truth of a statement) there is no specific term describing this in Klir and 
Yuan’s categorization. A possible translation could be “degree of belief as expressed by 
probabilities, possibilities etc.”. We may therefore be tempted to establish the (simplistic) 
rules of translation given in table 4.2. 


Using this table it is possible to re-state the mentioned duality between imprecision 
and uncertainty (Dubois and Prade’s terminology) in Klir and Yuan’s terms: Rendering 
the content of a proposition less uncertain, one will tend to increase its degree of confi- 
dence. 


Klir and Yuan point out different types of ambiguity which are all instances of the same 
basic situation: multiple alternatives are available and it is not possible to favor a specific 
single alternative. This is obviously related to the definition of zmprecision by Dubois and 
Prade who consider mappings to singletons to be the most precise information items. Klir 
and Yuan go on to subdivide this notion into the categories of discord and non-specificity. 
Non-specificity may be used in case a single decision maker cannot unambiguously choose 
one single alternative while discord expresses the idea that multiple alternatives contradict 
each other. This may arise if multiple decision makers are to decide on a specific point. 
We can therefore relate our work to Klir and Yuan’s terminology. According to them our 
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Dubois and Prade Klir and Yuan 
Imperfect Information Uncertainty 
Imprecision Sub-types of Uncertainty, 


Fuzziness and Ambiguity, 
Non-specificity and Discord, 
Numerical Imprecision. 
Fuzziness, Vagueness Fuzziness, Vagueness 
Ambiguity Sub-type of ambiguity, 
Non-specificity in 
frame of reference. 
Uncertainty Degree of Confidence, 
Belief, Probability etc. 


Table 4.2: A suggested very crude and approximative translation table for relating Dubois and 
Prade’s terminology to Klir and Yuan’s terminology. The table has to be used with 
care since the different terms do not convey exactly the same meaning for the men- 
tioned authors. 


estimates of non-specificity and numerical imprecision are measures of specific types of 
ambiguity which they call non-specificity, and imprecision. 


4.1.3 Our Choice of Terminology. 


We need to stick to a definite terminology in the following to avoid confusion. From a 
practical point of view the choice we make is not really critical because it does not affect 
the actual presentation of mathematical expressions, nor does it influence our decision 
on when to use what expression in the applications to follow. For the epistemic inter- 
pretation of the presented framework the issue as to what kind of terminology is to be 
preferred may well be of importance. When seen from a general perspective this issue 
is still unsettled. For the application we have in mind our choice has been driven by 
the motivation to observe the duality between imprecision and uncertainty as expressed 
by Dubois and Prade. In the following we stick more closely to their terminology, us- 
ing “imperfect information” as the abstract term which has different realizations such as 
“imprecision” and “uncertainty” in the sense of Dubois and Prade. The “uncertainty” of 
basic information items is therefore measured directly by the confidence values while the 
overall “imprecision” of an information item has yet to be assigned a numerical value. It 
is our goal to provide such measures for specific applications. Thus the measures we will 
discuss are all considered to be estimates of different types of “imprecision”. We differ 
from Klir and Yuan (and other authors {93]) in that we do not use the term uncertainty 
measure for estimates of different types of imprecision. 


Nevertheless, the terms we are finally using (fuzziness,non-specificity and numerical 
imprecision) also correspond closely to the ideas associated with them by Klir and Yuan. 
We go one step further and follow Klir and Yuan in considering non-specificity, and nu- 
merical imprecision as different types of ambiguity. Thus, whenever we speak of a measure 
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of ambiguity we mean either a measure of non-specificity or a measure of numerical im- 
precision. We therefore depart from the rather narrow definition of ambiguity given by 
Dubois and Prade who call an item of information ambiguous to the extent that it refers 
to several possible contexts or reference sets. In fact we shall not deal with that situation, 
assuming that the context and the reference set is known and remains fixed. We will still 
speak of ambiguity in the way the term is used by Klir and Yuan. 


Note, that there are other variants of imprecise information which we do not further 
consider here, such as ignorance. The lack of knowledge implied by the notion of zgnorance 
can be modeled explicitly in possibility theory and evidence theory. Jgnorance can be due 
to different factors, e.g. because of high non-specificity etc. Indeed, one may even use 
ignorance as a very general term in the same way Dubois and Prade use imperfect infor- 
mation and Klir and Yuan use uncertainty to describe the basic reality behind imprecise, 
uncertain and lacking information [75]. If one is willing to accept such a broad definition 
of ignorance then, of course, all the measures described in the following are also measuring 
some sort of ignorance. 


We will also avoid using the term measures of information in the sense that basic items 
of information are considered to be more informative if the information encoded by them 
is more precise. This is done, not because there is no rationale behind these expressions, 
but mainly to avoid confusion since the concept of information has already been given a 
well defined (and different non-subjective) meaning in information theory. We continue 
by adding some comments on the terms we will use and the interpretation we will attach 
to them. 


Fuzziness, Non-specificity and Numerical Imprecision. 


Today the concept of measures of fuzziness is well established and standardized. This is 
reflected in the uniform definition of that term by different authors: A measure of fuzzi- 
ness is a global measure of how strongly a fuzzy set as a whole deviates from a crisp set. 
Such measures of fuzziness are often made of aggregations of local measures of fuzziness 
which are functions of the singleton’s membership values. Fuzziness is a relevant notion in 
the classical interpretation of fuzzy sets as encodings of vague categories (as for example 
perceived by the average individual). 


Regarding ambiguity we will deal with two subclasses: non-specificity and numerical 
imprecision. We speak of non-specificity when the singletons represent multiple alter- 
natives for one single “opinion-holder’ or “expert”? who is unable to favor one specific 
alternative for example because multiple singletons receive a high membership value. 
Note that the interpretation of the fuzzy set is different for measures of fuzziness and 
measures of ambiguity. Typically ambiguity is a pertinent concept if we want to choose 
one singleton or alternative among others based on its membership. In these cases the 
fuzzy sets can stand for (maybe sub-normal) possibility distributions or distributions of 
confidences that are similar to probability distributions. 
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The basic alternatives provided by the singletons may be mutually exclusive (like 
in probability theory) or not. We shall restrict our attention mainly to the case that 
the alternatives are indeed mutually exclusive and each singleton represents only one 
basic alternative. For some of the considered measures the generalization to the case 
that various alternatives are not mutually exclusive will be straightforward. For other 
approaches the immediate way to deal with the more general case is to enhance the 
underlying universe U such that combinations of basic alternatives are added to the set 
of basic alternatives”. Because of this restriction we may have to some extent a similar 
interpretation for membership values and Bayesian probabilities in all those cases where 
we are interested in measures of ambiguity: both stand for numerical values quantifying 
the system’s belief in the presence of the event associated with the considered singleton. 
Even though the interpretation may differ greatly (as can of course be the case since we 
are dealing with different models) the mathematical framework can be set into relation 
and we will profit from mathematical expressions for measures of imprecision defined in 
different theories of uncertain knowledge. 


Metric vs. Non-Metric Universes 


Obviously the definition of measures of imprecision for fuzzy sets depends also on the 
nature of the underlying universe of discourse U. We suggest a broad subdivision into 
the two cases of non-metric universes and universes that are endowed with a metric or 
topology. Metric and non-metric universes put different constraints on the properties of 
measures of imprecision. 


Numerical Imprecision vs. Non-Specificity: The role of the metric. 


The term numerical imprecision will be used to describe the result of a measurement 
that delivers several alternative singletons whose distances to each other can also be mea- 
sured. Consequently we will speak of numerical imprecision only in case the universe 
of discourse is endowed with a metric (or a topology) such that scales or neighborhoods 
can be defined. In other words numerical imprecision will be used to express not only 
non-specificity - two or more alternatives are left unspecified - but also to imply that 
we can measure how wide-spread these alternatives are in the current universe U. The 
concept of numerical imprecision - as used in this work - becomes relevant whenever the 
singletons are physically measurable quantities such as light intensity, coordinate values 
etc. These quantities are encountered at the data-level, sensor-level and feature level of 
a pattern-recognition application and especially in connection with fuzzy numbers. We 
share the interest in this quantity with the robotics community. For instance, Saffiotti 
has proposed to represent numerical imprecision (as well as fuzziness, ambiguity and un- 
reliability) by similarity to certain types of fuzzy sets [116]. 


If there is no topology or metric on U (e.g. because no underlying physical measure- 
ment is possible) we speak of non-specificity when referring to ambiguous one-to-many 


>The problem with this approach is that the number of alternatives increases drastically as binary or 
higher combinations of events are considered to be basic alternatives. 
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relationships. This will for example happen when measuring the non-specificity of classi- 
fication results of certain classifiers. Such results can arise at various levels of a pattern 
recognition task but they are to be found most prominently on the highest level, the 
decision-level. 


Fuzziness and Metric. 


In principle fuzziness is a meaningful set-theoretic concept irrespective of whether a 
metric exists on the universe U or not. However, it is characteristic for measures of fuzzi- 
ness that they are especially useful whenever the universe of discourse U is endowed with 
a topology that can be interpreted physically (induced most often through a metric). This 
being ensured, a neighborhood of each singleton wu; can be defined and fuzziness becomes 
a measure with physical significance (the degree of blurring). The fuzzy sets used in 
control-applications are of this type. The defuzzification step which is commonly applied 
to obtain one output value is usually some kind of weighted average of the singletons. 
The implicit assumption underlying this procedure is again, most often, that for each 
singleton a neighborhood can be defined (for example by a metric). 


In light of this observation, it is interesting to note that, even though the universe 
of discourse on which fuzziness is defined is very often endowed with a metric, none 
of the hitherto proposed measures of fuzziness (our following discussion in this chapter 
included) makes use of that metric information. This is the case even though for curved 
universes U with locally changing metric (such as curved surfaces) the inclusion of the 
local metric in measures of fuzziness is the only reasonable way to proceed. Indeed, for 
such universes all measures of imprecision need to be made co-variant, i.e. independent 
to the change of (the arbitrary) coordinate system. Maybe the need for such a theory 
has not been recognized by engineers so far because work-around solutions for simple 
problems involving curved manifolds can be found without it®, and scientists working 
with intrinsically curved manifolds in other fields have not yet found any applications for 
fuzzy set theory. 


4.1.4 Basic Requirements for Measures of Imprecision 


The above discussion can also be based on a different grounding that brings us closer to 
the technical aspects of this work. There are basically four major parts in the definition 
of measures of imprecision for fuzzy sets. Each time when proposing a set of requirements 
we will be especially interested in 

e the most imprecise configuration(s), 


e the configuration(s) displaying least imprecision, 


e the transition between these two extrema 


°For example, if the manifold is topologically equivalent to a (possibly pseudo-)Euclidean space a 
change of coordinates can solve the problem. 
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e and other properties such as the presence or absence of invariances. 


It is especially the first two points that can serve to disambiguate the different types of 
imprecision which we consider. Considering table 4.3 we observe that fuzziness is charac- 
terized by the two extrema of crisp sets on the one hand and the set with all memberships 
equal to 0.5 on the other hand. Non-specificity and imprecision are meaningful between 
singleton hypotheses and the set with all membership values equal to 1.0. 


Type of Imprecision Least Imprecise Sets Most Imprecise Sets | Chapters 
, Crisp Sets in C 
Fuzziness A F 5 
ec oeak only 1 and 0 memberships a8 
Non-Specificity Singletons in C 2 OF sets Equal memberships, 6 
with only a few singletons. especially Fy. 
Singletons or sets with Equal memberships, 
Numerical only a few singletons around Fy(t), 7 
Imprecision next to each other. especially Fy. 


Table 4.3: The considered types of uncertainty and their extremal or limit sets. On the right 
the corresponding chapters of this thesis. 


Thus every measure of imprecision H splits the space of possible fuzzy sets K, into 
three parts: 


e The region Uy C Ky of most imprecise sets: F € Uq & A(F) = mar H(F"). 
e The region of Vy C K;, of least imprecise sets: F € Vy & H(F) = mine H(F"). 
e The transitory region Wy C Ky: F € Wy & mine H(F") < A(F) < mare A (F). 


The subscripts on Uy, Vy and Wy indicate that the exact location of these regions 
depends on the considered measure H. In the following, the definitions for the two 
extremal or limit regions Uj, and Vz, will be postulated while the values of the considered 
measures of imprecision for intermediate sets F’ € Wy will be calculated from their 
distance to the limit sets. Various definitions of distance between fuzzy sets exist [21, 22]. 
We will use mainly 


1/q 
d,(F, F’) := bs a ‘i (4.3) 


because of the ease of geometric interpretation and because it is differentiable. It is 
understood that q € [1, 00). The distance d,(F, F’) runs under the heading functional ap- 
proaches to measuring distances in Bloch’s review [21, 22]. dz is the Euclidean distance in 
K,; while d,. is the distance defined through the max-norm. It is invariant to permutations 
dg(PF,PF') = dF, Ff’) with PF = (ur(upa),.-,r(upmy)) and p : {1.4} —> {1.4} 
being a permutation of the integers 1,2,..,4. Another noticeable property of d, is that it 
takes a constant value if the supports of F and F” are disjoint. For these reasons it will 
not be used for defining measures of numerical precision. 
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Our approach can be compared to the usual procedure of defining measures of im- 
precision through a relation F' <x G which expresses that fuzzy set F’ is considered less 
imprecise than G: 

FXxG=> A(F) < H(G) (4.4) 


The advantage of the geometric approach relying on distances is that it gives us a tool 
to define both the relation on the left side of the above implication and the measure 
on the right side. From the practitioners point of view this is a very convenient situation: 
F is considered less imprecise than G' if it is closer to Vy and farther from Uy (The 
distance measure has to be chosen such that these two conditions are equivalent in order 
that the geometric approach makes sense. ) 


Observe that the cardinality of the two limit regions Uy and Vy may be different. 
For example in the case of fuzziness we have only one single most fuzzy set Uy = {Fos}. 
Since our measures of imprecision are based upon distances to the limit sets such advanta- 
geous situations will be exploited. Whenever possible we will preferably define measures 
of imprecision for which only a few distances have to be calculated. Consequently we 
will often decide in favor of one of the two limit regions for the calculation of distances. 
Indeed, we will even use approximate solutions by comparing distances to regions S € Ky 
that are not the true limit regions of H but for which very compact expressions of distance 
can be found. The enormous practical advantages of this procedure outweigh the neces- 
sary abandoning of some useful theoretical properties such as the always correct partial 
ordering induced by = when using the true limit regions. We regard the ease with which 
such approximate but compact definitions of measures of imprecision can be established 
to be another advantage of the geometric approach. Since some of the presented measures 
of imprecision will be approximate we use the term estimate of imprecision synonymously 
with measure of imprecision. 


4.2 Overview 


In each of the following chapters we will first give a focused overview of existing measures 
of imprecision, and then introduce some new measures. For broader reviews of specific 
types of measures of imprecision see [93],[94],[92] and [69]. It is understood that the 
measures defined here may also prove useful for distributions of confidence values that 
arise in other pattern recognition tasks which are not related to object recognition. We 
will discuss 


e measures of fuzziness in chapter 5, 
e measures of non-specificity in chapter 6, and 
e measures of numerical imprecision in chapter 7. 


We have included measures of fuzziness even though we will not apply them in our ac- 
tive fusion algorithm. By including measures of fuzziness the presentation covers most of 
the existing measures of imprecision for fuzzy sets and distributions of confidence values. 
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In light of the general problem of estimating the “quality” of results in active fusion we 
consider it to be of great interest to discuss as many types of imprecision as possible. One 
can easily imagine situations in which fuzziness will play a role in deciding whether an 
obtained or expected output should be regarded to be of “high quality” or of “low quality”. 


Furthermore, we will see that there exists some terminological confusion regarding 
quite a lot of measures of fuzziness. They carry names referring to the term “entropy” 
which is actually only appropriate for measures of non-specificity. The situation gets even 
more involved since for sum-normal distributions of confidences these measures (which are 
intended to estimate fuzziness in Ky) can indeed also be used to measure non-specificity 
in K,. For these reasons we begin our study of measures of imprecision with the notion 
of “fuzziness” . 


In technical terms the question to be discussed in the following is: What numbers 
should we assign to the points of Ky, (i.e. to the possible fuzzy sets over U) in order to 
obtain reasonable measures of fuzziness, non-specificity or numerical precision ? In other 
words, we want to define a function H : Ky — R that can be interpreted to be a measure 
H(F) of the overall imprecision present in F’. 


We will not explicitly treat the following problems which are left for future research: 
e Continuous universes of discourse U. 
e Mutually non-exclusive basic alternatives for measures of ambiguity. 


e Use of metric information on U (apart from its use for measures of numerical im- 
precision). 


Chapter 5 


Measuring Fuzziness. 


5.1 A Review of Measures of Fuzziness. 


A great variety of measures of fuzziness already exists. We will first review some of the 
most important ones and then go on to elaborate on the previously mentioned ideas to 
define new measures of fuzziness. We discuss measures of fuzziness mainly for two reasons: 
First because we want to put the measures of non-specificity and numerical imprecision 
into context with the bulk of existing measures on fuzziness. Second we want to exemplify 
on a well known test-bed what lies behind the geometric approach to the definition of 
measures of imprecision. A more detailed discussion of measures of fuzziness can be found 
for example in [92]. 


Table 5.1 gives a list of some common measures of fuzziness. All these measures fulfill 
most of the following properties suggested by DeLuca and Termini [41], Ebanks [49] and 
others: 


Requirements for measures of fuzziness: 


1. Sharpness: H(F)=OSFEC,, ie. pw; =Oorl Vi. 
Maximality:  H(E) is maximum pi; = 0.5 V 1. 
3. Resolution: H(F) > H(F) whenever F is a sharpened version of F, 
Le. fi < 
(f= 
A( 


nN 


5. ipl i Ob and pe. ji, jt, 0.5: 
4. Symmetry: H(F Hl —F), with F=1-Fef,=1- py. 
5. Valuation : FUG)+A(FNG) = A(F)+ A(G). 
For example, Hpr-' fulfills all properties (1)-(5). The same do Hppr, Hrgg and Hay 
forg=1. Hyar,Hepr,H Kor and Hxa, satisfy requirements (1)-(4). 


Sharpness: this property defines the region of minimal fuzziness in Ky which is the 
set of crisp sets Cy. All crisp sets display zero fuzziness, i.e. measures of fuzziness do not 
distinguish between the sets (1,0,...,0) and (1,1,..,1). 


'See table 5.1 for the mathematical expressions. 
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Author(s) Definition 

DeLuca and Termini [41] k 

AT Ti Nee “vez oq Del s+ —p,) log — jp 
Generalized Entropy pre(F) 0 Dain Mi log ni + (1 — Hi) log(l — ps) 
Yager [134] an ee eh a Ass 
Pisces Hyar(q, F) = 1 gira tg, F), b= 1 Li 

- 1/q 
Kaufmann [66] df) = (a [tei — jul") , G€[1, 00) 
Index of Fuzziness prear -— 1 — pf" -— 1 if py, > 0.5, 0 otherwise. 
Axa(q, F) — and F, pre) 

Kosko [71] fa near far 
ope Hyon (f ) = sd FF) dg FI) 


Pal and Pal [95] 


A F):= ‘ sel 1 — p;)e” 
ae ppE(F) Gwe: ea je 


Bhandari and Pal [16] 
Entropy 


Hppa(Q.F) = Gey Vier log (ul + (1 - 4)*) 


Fermi, in [38] 
Quadratic Entropy 


Hron(a,F):= ad, wil — pi)! 


Table 5.1: Some common measures of fuzziness. Authors, names and definitions. The fuzzy set 
F is defined through the vector ps = (41, .., 4%) of membership values. 
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(a) Hpre(F) (b) Hppr(F) 


(c) Akau(@ = 1, F) 


(e) Axor(q = 1,F) (f) Axon(q = 2, F) 


(i) Hyar(q = 1.5, F) (j) Hepe(q = 2, F) 


Figure 5.1: Measures of fuzziness for fuzzy sets F' = (j11, U2). The two horizontal axes represent 
the two membership values ju; and jz while the vertical axis represents the value of 
fuzziness. 
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Maximality: through this property the region of maximal fuzziness is defined. It is 
the single central fuzzy set Fos. 


Resolution: this is a general requirement for the transition from the limit set of mini- 
mum fuzziness to the limit sets of maximum fuzziness. 


Symmetry and Valuation: these are two additional requirements which are not di- 
rectly related to the definition of limit sets and the transition in between. The symmetry 
property states that a fuzzy set should be regarded to be as fuzzy as its complement. 
The valuation property is an algebraic property reminiscent of measures of volume. It is 
sometimes omitted. 


For finite U Knopfmacher [70] and Loo [80] have proposed a general mathematical 


form for H(F): 
A(F) = f (>: cat) (5.1) 


with constants c; € Rt, functions g; : [0,1] — Rt such that g;(0) = g;(1) = 0, 
gi(t) > OVt € (0,1), g(t) < g(0.5) Vt € [0,1]\ {0.5}, g; non-decreasing on [0,0.5) and non- 
increasing on (0.5, 1], and g;(t) = g;(1—t). f : Rt — R* is monotone increasing. To fulfill 
the valuation property f has to be linear. Adding the property of symmetry under per- 
mutations of the singletons u;, Ebanks [49] obtained the result that H(F’) has to be given 
by H(F) = 3x*_, g(u:). A typical exemplary function g(t) := —tlog(t) — (1 —t) log(1—t) 
is depicted in figure 5.2. 


Sets for which the above measures give high values may be described as being vague, 
cloudy, hazy, unclear, indistinct, sharp-less or fuzzy. These sets are located far away 
from the corners Cr of Ky and close to the center fo5. Ha, and Hxog both display 
this behavior in a very explicit manner, by relying on the distances d,(F, F"°") and 
d,(F, F/*"). F°*" is the crisp set located in the closest corner next to F’, while F/*” is 
the crisp set located in the corner opposite to F'"°°", i.e. the corner farthest away from 
F. Figure 5.1 shows the discussed measures of fuzziness in the simple case of a two- 
dimensional universe of discourse. These surface plots display the characteristic peak at 
Fo.5 and the minima at Cr. One can see from figure 5.1 (as well as from table 5.1 ) that 
not all measures are differentiable. Only differentiable measures can be used in iterative 
gradient based optimization schemes. 


5.2 Geometrically inspired measures of Fuzziness. 


Let us now begin to work out the ideas shortly presented in section 4.1 for the case of 
fuzziness. We define the degree of fuzziness to be directly related to the distance from the 
limit sets {Fo.5} or Cr. As we have already mentioned before the cardinality of these two 
regions in K; is very different: As visualized in figures 5.2(b) and (c) for the case of two 
dimensions card(C yr) = 2* (with k being the number of singletons) while the cardinality 
of {Fo.5} is simply one irrespective of the dimensions of Ky. 
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(0,1) (1,1) (0,1) (1,1) 


(0,0) (1,0) (0,0) (1,0) 


(a) (b) (c) 


Figure 5.2: Figure (a) shows the function g(t) := —tlog(t) — (1—t) log(1—t) plotted for t € [0,1]. 
Figures (b) and (c) depict the regions around the limit sets for fuzziness. The most 
fuzzy center point fo.5 and its surrounding is white while the corner points of Cy 
and their surroundings are black. Measuring the distance to all the black regions 
around the corners in fig.(b) is much more costly than simply measuring the distance 
to Fo.5 as indicated in figure (c). For ease of visibility the case of fuzzy sets Fp in 
two dimensions F2p = (11, [42) is depicted. 


5.2.1 Measuring the distance from crisp sets. 


The (impractical) brute force approach would consist in aggregating 


Hp(F):= /\ h(a(F,F’)). (5.2) 


F’eCy 


The aggregation operator /\ may be any t-norm such as /\ = [| or A = min. T-norms 
are taken because we want to ensure that if one term in the aggregation becomes zero 
Hyra(F) should be zero. The function a: Ky x Ky — Rj measures the distance between 
two fuzzy sets F and F’, e.g. a(F, Ff’) := d,(F,F"’). We assume here that a(F, F’) is 
invariant to permutations a(PF, PF’). The function h : Rj — [0,1] is increasing and 
depends on the distance of F” to F: for small distances h(x) achieves values close to 0 
while it rises to 1 for bigger distances such as x = a(Fo.5, fF’) with F’ € Cy. h(x) is thus 
a potential which is sensitive for nearness to a corner point F’ € Cy. The aggregation 
rec, h(a(F, F’)) gives the total potential measured at point F’. 


Considering all F” € Cy in equation (5.2) is of course impractical, but note that if 
/\ = min we have 


Hya(F) i= min h(d,(F,F’)) = h( min da(F, F')) = A(ag(FF'")) (5.3) 


and Hp,(F') becomes simply a rescaled version of Kaufmann’s index of fuzziness Hx q,(F’) 
(see table 5.1). To reduce the computational burden in the above equation it has been 
decisive that a simple algorithm exists for identifying the closest crisp set. 
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(c) (d) 


Figure 5.3: Examples of the discussed measures of fuzziness for fuzzy sets 2p in two dimensions 
Fop = (t1, M2). Figure (a) shows Hprp(Fop) with o = 2,p = 2,q = 2. Figure (b) 
shows Hp,(F2p) with o = 2,p = 4,q = 2. Figures (c) and (d) show Hra(F2p) with 
the parameters o = 2.5,p = 2,q=2 and o = 0.7,p = 2,q = 2. and A = min. 


5.2.2 Measuring the distance from the center set. 


Another approach consists in measuring the distance only from Fo.5: 
Hy (F) := h(a(F, Fos)). (5.4) 


The function h is high for distances close to zero and falls off for bigger distances. We 
may set for example h(x) = 1 — h(x). With the above expression we have switched to a 
different model for fuzziness: we no longer compare with crisp sets but with the unique 
most fuzzy set Fo.5. 


We have d,(Fos, fF’) = 0.5k!/4 when F’ € C, and Ky has k dimensions. Setting 
a(F, F’) := z,d,(F, F’) we can therefore assume that the distance from any corner set 
F’ to the center Fo.5 is equal to unity irrespective of the dimensions. For example, letting 
h(x) := 1+ ho(e~?”” — 1) we can choose the parameters a and hg such that h(0) = 1 
and h(1) = 0. Thus hg := 1/(1 — e~”). The plots in fig. 5.3 give a visual impression of 
the above definitions. Of course, other functions h(x) may also lead to valid measures of 
fuzziness. 


The idea of using “potentials” to set up measures of uncertainty is related to function 
approximation. For example in eq. (5.4) we have one “bump” at the center Fo,5;. Because 
of this connection to function approximation the parameters of measures of fuzziness can 
also be estimated from examples. 
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We have re-discovered well known measures of fuzziness by using the geometric princi- 
ples stated in the beginning. This strengthens our belief in the soundness of the outlined 
programme. For measures of non-specificity the geometric viewpoint will offer new per- 
spectives. 
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Chapter 6 


Measuring Non-Specificity. 


6.1 A Review of Measures of Non-specificity. 


Again we start by reviewing existing measures of non-specificity. Since some definitions 
already exist for particular types of fuzzy sets it proves useful to present measures of 
non-specificity for sum-normal, maz-normal, and general fuzzy sets in separate sections. 
Many different measures of non-specificity are at our disposal when using sum-normal or 
max-normal fuzzy sets. Only very few measures exist for non-normal sets. 


6.1.1 Measures of Non-Specificity in K,. 


For sum-normal fuzzy sets there are at least three desiderata when measuring non- 
specificity: 


Requirements for measures of non-specificity in K,: 


1. Minimality: H(F)isminimum © FeEC,. 
2. Mazimality: H(F)ismaximum & F = F\,,. 
3. Symmetry : H(PF) = 4H(F) for all permutations P. 


Minimality: Through this property the set of most specific fuzzy sets is defined. The 
configurations where only one singleton receives all of the obtainable confidence mass 
obtains the minimum value. The set of corner points C, is the set representing totally 
unambiguous fuzzy sets. 


Maximality: The configuration which is least specific within K, is the one where all 
singletons receive an equal amount of confidence mass. F/, is the fuzzy set in the center 
of K, and represents the most ambiguous situation possible within K,. 


Symmetry under Permutations: Because of permutation symmetry measures of non- 
specificity do not depend on the way we enumerate the members u; of the universe of 
discourse U. This is usually reasonable if the u; stand for different classifiers and pu(u;) 
denotes the confidence for a specific class. Measures that do not fulfill property (3) will 


67 


68 CHAPTER 6. MEASURING NON-SPECIFICITY. 
be discussed in chapter 7. 


Some common measures of non-specificity in K, which fulfill all of the three require- 
ments are listed in table 6.1. The most prominent measure of ambiguity in K, is, of 
course, Shannon entropy Hgpp(F’) := —a an 4; log ;. It stems from information the- 
ory [121, 122] though its use in physics dates back to Boltzmann. Hgpjg has been used 
extensively in applications of probability theory. It can be shown that Hsp" is the unique 
measure which fulfills all three requirements stated above plus a fourth one: additiv- 
ity, a property which relates Shannon entropy to (probabilistic) product fusion for non- 
interactive events and establishes a link between probability theory and Shannon entropy. 


As a matter of fact various other measures of non-specificity that are listed in ta- 
ble 6.1 have first been defined in the context of probability theory: Fermi’s quadratic 
entropy [38], Dardéczy entropy of order (3 [39], Rényi’s entropy of order a [112], and Boe- 
kee’s & V.d.Lubbe’s r-norm entropy [25, 133] have all been proposed to replace Shannon 
entropy for certain applications of probability theory. However, since the property of 
additivity uniquely defines Shannon entropy as the most appropriate measure whenever 
non-interactivity leads to product fusion, none of the alternatives has become as popular 
as Shannon entropy. For our applications the property of additivity is not as decisive as 
are the other three properties stated in the beginning of this section which are shared 
by all of the probabilistically motivated measures of non-specificity. Thus we can use 
such measures to estimate non-specificity also for sum-normal fuzzy sets even though the 
interpretation of a membership value can be completely different from the interpretation 
of a probability. 


Let us now consider yet another case in which the mathematical formalism stays the 
same but the interpretation changes. Every measure of fuzziness in K+, when restricted 
to K,, will satisfy the requirement of minimality by definition. The same is true for sym- 
metry. And quite a lot of measures of fuzziness fulfill even the property of mazimality. 
In fact, DeLuca and Termini’s generalized entropy, Pal and Pal’s entropy and Bhandari 
and Pal’s entropy are measures of fuzziness in Ky that fulfill all properties of measures of 
non-specificity in K,. Hence, they are listed in tables 5.1 and 6.1. For other measures of 
fuzziness the property of maximality has to be checked in each case separately. For exam- 
ple, setting the dimension of the universe of discourse k = 3, Hxog with q = 2 does not 
satisfy maximality while it satisfies maximality with g = 1. Still, a lot of the measures of 
fuzziness - when used only within K, - fulfill all properties of a measure of non-specificity. 
This circumstance together with the misnomer “entropy”! for some measures of fuzziness 
may cause confusion and in the worst case even misapplication of measures of imprecision. 


This can occur since extrapolation of the formulae in table 6.1 from the domain K, to 
K,; may induce a change in interpretation of the measure. For example Hpr- can be used 
perfectly well for measuring non-specificity when applied only to sum-normal fuzzy sets 
in K,. But it is not a good idea to use Hpr- for measuring ambiguity in Ky, as it gives 


'This terminology has already been criticized before [46]. 


6.1. A REVIEW OF MEASURES OF NON-SPECIFICITY. 69 


Author(s) 


Definition 


DeLuca and Termini [41] 


r-norm entropy 


Generalized Entropy Hpre(F) = —a yee Hi; log py + (1 — py) log(1 — ps) 
hil nit stepiohtiaie 
ary Hopes F) = gq Why log (uf + (U— )") 

ae peed Hsnp(F) = —a Oi, mi log 

Cee Hrqe(aF) =o Li wil ~ 1)! 

oF oe B Heed Ne =I] (Sian Me = 1) 

sage order a Arena, F) = ina logy et My 

Boekee & V.d.Lubbe [25, 133] Herat, F):= c ix (Be 1!) ") 


Table 6.1: Some common measures of Non-Specificity in K;. Authors, names and definitions. 
The fuzzy set F is defined through the vector = (1/1, .., uz) of membership values. 
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zero for all crisp sets, very non-specific ones included. Hpr- can only be used to measure 
fuzziness in K;. For measures of non-specificity stemming from probability theory there 
is usually no point in extrapolating the domain of application beyond K,. Doing so, 
for example, with the original Shannon entropy one ends up with an expression which 
achieves its maximum in Ky, at (1/e,...,1/e), a location that is not at all distinguished. 


6.1.2 Measuring Non-Specificity in K,,. 


For max-normal fuzzy sets there are some measures available from possibility theory. 
Dubois and Prade [46] discuss such indices of (im-)precision that were introduced by 
Yager [135]. A measure of (non-)specificity is a function H(F’) from Ky into [0,1] which 
is monotonously (increasing) decreasing with respect to fuzzy set inclusion, and which 
equals 1 only for singleton subsets of the reference set U (i.e., the most precise values). 
In K,,, measures of non-specificity should fulfill the following requirements: 


Requirements for measures of non-specificity in K,,: 


1. Minimality: H(F)isminimum © FEC,. 

2. Mazimality: H(F)ismaximum © F = F, =(1,1,..,1). 

3. Inclusion : FG Ss AC Su GP): 

4. Symmetry : H(PF) = 4H(F) for all permutations P. 


Minimality : Again it is the singleton subsets that are most specific. Mazimality : 
Within K,, the unique least specific fuzzy set has all membership values equal to one. 
Inclusion : If a fuzzy set can be included in another set it is considered to be more specific 
than the other one. Note that this property has been of no importance for sum-normal 
fuzzy sets because if F and F” are sum-normal then we have FC F’ & F = F’. Symme- 
try under permutation expresses the fact that the labels attached to the singletons may 
be arbitrarily permuted and is an additional property that we demand here. 


While minimality and maximality define the limit sets of most and least specific sets 
the requirement of inclusion is a possible requirement for the transition from minimal to 
maximal non-specificity. Note that the requirement of maximality can actually be dropped 
from the list since it is a consequence of minimality and inclusion. Indices of imprecision 
as understood by Dubois and Prade are defined only through minimality and inclusion 
and are thus more general than our measures of non-specificity for max-normal fuzzy sets 
because they need not be symmetric. 


Table 6.2 shows three common measures of non-specificity for max-normal fuzzy sets. 
The measures have also been restated in terms of measures for (max-normal) possibility 
distribution m7 := (7,..,7) with 1; = m(ui,) := r(u,) where the index 7; has been 
chosen such that 7(u;,) > (ui,,,), ie. the 7; are equal to the sorted membership values 
with 7, = max y<i<zp fr(ui) = 1. The third line shows the corresponding expression in 
evidence theory. See for example [45, 46, 69] for a discussion on the notation and trans- 
lation of these expressions from one theory to the other. 
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Author(s) Definition 
Yager [135, 136] NSp(F) = 1- fj Anda 
Nonspecificity NSp(r) := 1- ya * (mj — Ti41) 


NSp(m) = 1- Dyer 3 


Higashi & Klir [62] N(F) := AF) fee logs |°F| da 
Non-Specificity Ne) = Se (7; — Titi) logyi 
N(m) = Yixerx MX) logs |X| 


Lamata & Moral [76] W(F) = logy (Se mi 
Imprecision W(m):= logy (0h mi 
W(m) := log, ose m(X)|X]) 


Table 6.2: Some measures of non-specificity for max-normal fuzzy sets (h(F’) = 1). The second 
line shows the corresponding expression in the possibilistic formulation (assuming 
ordered possibility distributions) while the third line shows the analogue in evidence 
theory. See for example [45, 46, 69] for the transition from one formulation to the 
other. 
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If f is an increasing function f : [0,co) — [0,1] one can obtain normal measures of 
imprecision from Higashi and Klir’s measure of non-specificity N [62] and Lamata and 
Moral’s measure of imprecision W [76] by using fo N and foW. Note that the only 
important contribution in W is simply the cardinality |F'| := 5°, r(u;) of the fuzzy set 
F. For max-normal distributions the cardinality cannot be smaller than one. 


6.1.3 Measures of Non-Specificity in K;. 


Let us start the review by proposing a few basic requirements for measures of non- 
specificity in Ky that are motivated by analogy with the requirements in K, and K,, 
and by the fact that these are the minimum common properties of the existing measures 
of non-specificity in Ky. 


Proposed requirements for measures of non-specificity in K f: 


1. Minimality: H(F)isminimum © FEC, 

or A(F)isminimum @©& FeEC,UL with LC Ky. 
2. Maximality: H(F)ismaximum © F = Fy(G') with @’ € S C [0,1]. 
3. Symmetry : H(PF) = 4H(F) for all permutations P . 


Minimality: Measures of non-specificity should achieve their minimum values for least 
ambiguous fuzzy sets. Usually only sets in C, qualify for being termed least ambiguous 
or most specific. Sometimes it may also be appropriate to require minimal ambiguity for 
some additional fuzzy sets F’ € L. Most often the set L will consist of the fuzzy sets 
given by F7)(3) := (G,0,..,0), @ € [0,1] and its permutations PF;(3). Since non-specificity 
expresses one to many relationships it may appear natural to set H(PF;(3)) = min as 
only one element u; is present with a membership value of (3. 


Mazimality: Because of the requirement of maximality only fuzzy sets along the di- 
agonal Fy(3) of Ky should be among the least specific fuzzy sets. The sets F4() have 
equal membership values ju; = ( for all u;. Usually H(F\) gives the maximum value for 
non-specificity but other Fy,(G) with @ < 1 may as well give the same value. 


Symmetry: The requirement of symmetry states that non-specificity should not de- 
pend upon the way we enumerate the singletons u;. For non-symmetric measures of 


imprecision see chapter 7. 


Some existing measures of non-specificity in Ky display another property : 


A possible additional property of measures of non-specificity in Ky: 


4. Scaling H(F) = f(@-')H(GF) for B € [0,h(F)“'], f : Rf RZ. 
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The scaling property relates non-specificity for different fuzzy sets: With f(G7-!) =1 
non-specificity is independent of the height of F’, i.e. Fi, = (0.5,0.5,0,...,0) is regarded 
to be as ambiguous as Fy = (1,1,0,...,0). Taking f(G~') = 671 the set F, is regarded 
to be twice as ambiguous as fF. The existing measures of non-specificity which fulfill 
the scaling property (i.e. Hxag and Hyxn) use f(3) = 1. Because of this the set Fo 
makes troubles as we obtain different limiting values limg..o H(GF’) for different ways GF 
to approach it. In each neighborhood of Fo all possible values of H can be found. This 
makes it impossible to define H(Fo) such that H would be smooth everywhere. Thus 
in the vicinity of Fo small variations in the fuzzy set’s membership values can lead to 
extremely large variations in the estimate of non-specificity. If all possible F € Ky are 
to be assigned a value of ambiguity and smoothness or even differentiability of H(F’) is 
necessary (e.g. in a gradient based optimization algorithm) and scaling should be retained 
some modifications of f(G~1) for fuzzy sets close to Fy are necessary. We will say more 
about the special role of the set Fo for measures of non-specificity below. 


Author(s) Definition 
Kaufmann [66] wae ire ss pis 
Entropy Hoel PF) = Sek = [ti log ft; 


Higashi & Klir [62] | Huxn(F):= gay fy? log, [°F | do 
Non-Specificity 


Yager [137] f= Ypi/lk. 
Non- Specificity SPmax = 1—1/k. 
Hyq(F):=  SBese menu) 


Table 6.3: Some common measures of non-specificity in Ky. Authors, names and definitions. 
The fuzzy set F is defined through the vector = (11, .., Uz) of membership values. 


Table 6.3 contains some common measures of non-specificity in Ky. Hxag is an “exten- 
sion” of Shannon entropy to Ky while Hy%n has been obtained by applying the eztension 
principle to ordinary Hartley entropy [62, 60]. Both display similar qualitative behavior 
in Ky. Figure 6.1 shows these two measures in the case of a two dimensional universe of 
discourse and slices cut in the case of a three dimensional universe U. 


During the calculation of Hxag the fuzzy sets are first sum-normalized in order to 
make Shannon entropy meaningful. Geometrically speaking sum-normalization of the 
fuzzy set F amounts to projecting the set F onto K, along the ray emanating from the 
origin Fo and passing through F’. All sets along the rays through Fo and F' return the 
same value for Hxag. Thus Hxaz(F’) fulfills the scaling property with f(3) = 1. The 
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(a) Hran(Fop) (b) Huxn(F2p) (c) Hyan(Fop) 


(d) Hxan(Fsp) (e) Huxn (Fp) (f) Hyan(F3p) 


(8) Hkan(F3p) (h) Huxn(F3p) (i) Hyan(F3p) 


Figure 6.1: Measures of non-specificity in Ky depicted for fuzzy sets Fop = (1,2), Fap = 
(1, 2, 43) and F%,, = (1, 2, 3). In the slices of figures (c) and (d) greater intensity 
corresponds to greater ambiguity. 
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scaling property together with the restriction that Hxagg should coincide with Shannon 
entropy in K, suffices to determine its value everywhere in Ky. Hag is minimal for 
all fuzzy sets F = (0,..,0,,0,..,0) with 6 € [0,1]. Hxag is maximal for points on the 
diagonal F);(3) with the exception of the special set Fo. 


Turning to Hyxn(F’) we see that the normalization step of dividing by the height 
h(F) of the fuzzy set F amounts to transforming F' to F’ with F” being the max-normal 
counterpart of F. Again Hyn(F’) fulfills the scaling property Hyxn(F’) = Hyxn(GF). 
Max-normalization can also be interpreted geometrically: It projects the sets F’ on those 
faces of Ky where at least one ju; = 1, which is the set of normal fuzzy sets K,,. Therefore 
knowing the values of Hy in K,,, suffices to determine its value everywhere inside K ,. 
Regarding minimality and maximality the same conclusions can be drawn as before. 


Concerning Hy,y we have a different situation. Hy,y is derived from a very simple 
measure of specificity: max;p; — fi. The maximum value of Hyqy is achieved for all values 
along the diagonal Fy. The minimum value is achieved only for F' € K,. The measure is 
well-behaved in the vicinity of Fo giving a value of maximal non-specificity for Fo. The 
measure is non-differentiable which prevents its direct use in gradient based optimization 
algorithms. 


Non-Specificity and Fo. 


Before continuing we must discuss the role of the special set Fo which expresses a one 
to none relationship. Fo captures the NULL-event, i.e. none of the possibilities u; is 
realized. If we work with a closed world model the fuzzy set Fo has to be regarded to 
be an invalid outcome of processing. Only if we admit that the world-model we use may 
be open or is definitely open Fo can be given the above interpretation. Even in case we 
have an open world-model Fo may not be interpretable if one of the singletons u;, say ug 
(the NULL-Label) should already capture the NULL-event. In that case, all the singletons 
ui, i = 1,2,.. are regular events (such as e.g. class 7) while uo is the NULL-Label indicating 
the NULL-event (none of the u;, i = 1,2,..). For normal fuzzy sets the use of a NULL- 
Labelis mandatory if NULL-events should be allowed, because Fo lies outside K, and K,,,. 


If Fo is an invalid outcome it is the task of the pre-processing modules to avoid such 
an output, e.g. by restriction to fuzzy sets obeying Se pu(u;) > 1. But even if Fo is a 
valid result (because it is used to indicate another possibility which is not included in the 
universe U) there is still no straightforward way of defining non-specificity for Fo. Since 
Fo transcends the original universe U we do not actually know if Fo indicates one specific 
alternative or many different possibilities that have not been modeled by U. Consequently 
one cannot expect a unique and clear definition of non-specificity for Fo. This is in fact 
the deeper reason why many measures of non-specificity are no longer valid for the fuzzy 
set Fo and display highly non-smooth behavior in the vicinity of Fo. 


However, from a purely pragmatic point of view this situation can be inconvenient 
since it may be necessary to check always for the special case of fuzzy sets close to Fo. 
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Depending on the application a certain arbitrarily defined value of non-specificity for Fo 
may automatically produce the desired behavior of the program that uses measures of 
non-specificity to guide processing. In such a situation there is little reason to refrain 
from a definite assumption for the value of non-specificity for Fo. 


For example Hy,y assigns the maximum possible value of non-specificity to Fo. It 
may be desirable to consider H(Fo) = maximal to include a tendency of avoiding trivial 
solutions in the program or if H(Fo) is actually an invalid outcome (e.g. because of the 
existence of a NULL-Label). If Fo is delivered from the pre-processing module it may be 
is easiest to avoid further processing of this possibility by associating a high estimate of 
non-specificity with Fo. Other measures which give high values for Fo will be considered 
in section 6.2.2. 


Below we will discuss scaling functions f(3~') which smoothly enforce H(Fo) = 0 for 
generalizations of Hyxy and Hxag. One rationale for preferring H(Fo) = 0 may be that 
this is also the value that would be obtained if a NULL-Label uo was used and pr(uo) = 1 
while pur(u;) = 0, i > 0. It will depend on the application which value of H(Fo) is most 
convenient. 


6.1.4 A slight extension of existing measures of Non-Specificity. 


We conclude the review of existing measures of non-specificity here and briefly extend the 
domain of some of the measures defined for normal fuzzy sets to general fuzzy sets. The 
basic idea is simple: Just as Hxag and Hyxn can be defined first on subsets of Ky and 
then be extended to K; through the scaling property we can scale up the domain of other 
measures of non-specificity such that every possible fuzzy set is covered. We use 


H(F) := f(6"')H(6F) (6.1) 


with) G-? 7= ay ui; for measures H that have been previously defined only within K, 
and 3-' := max*_, 4; for measures H whose original domain is K,,. We have already 
discussed the problems that arise with the fuzzy set Fo when defining non-specificity 
through eq. (6.1). Of course one can stick to f(G~') = 1 but the only way to obtain a 
smooth measure H(F’) in Ky is to demand limg-1_,9 f(G-') = 0. For fuzzy sets that are 
not close to Fo. it seems quite natural to set H(F)  H(GF), ie. if G-! > threshold 


then f(G-') = 1. Both demands are satisfied by choosing for example f(@~') = ela) 
with o >>1landq>0. 


Figure 6.2 depicts some of the newly defined measures in the case of a two-dimensional 
universe. The definitions of the original measures H(F’) which have been used in eq.(6.1) 
can be found in tables 6.1 and 6.2. 
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(a) (b) Hrge(2, F) 
Hran(0.5, F) 


(c) (d) Hpaz(4, F) 
Apap (0.5, F) 


(c) (f) Hren(4, F) 
Hren(0.5, F) 


(g) (h) Herel4, F) 
Aprp(0.5, F) 


(i) NSp(Fop) (j) W(Fop) 


Figure 6.2: Measures of non-specificity H(Fap) = f(G~')H(GF) which are defined in Ky 
through extending the domain of measures defined on K, and K,,. F = (11, M2) 
with 8-! = py + pe or B~! = max({11, 12). See also eq.(6.1) and tables 6.1 and 6.2. 
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Figure 6.3: The geometry underlying measures of non-specificity in K,. For ease of visibility 
the case of fuzzy sets in three dimensions F3p = (11, (2, M3) is depicted. 


6.2 New Measures of Non-Specificity. 


We now go on to pursue the program that has been outlined in the introduction and 
define new measures of non-specificity through geometric considerations. 


6.2.1 Measuring Non-Specificity through distances in K,. 


Following the ideas stated in the introduction let us first define measures of non-specificity 
that rely on comparing the fuzzy set F' with completely non-specific distributions F” € C,. 
In this section we will define the measures first in K, and apply the “extension principle” 
already used in section 6.1.4. 


Hy(F) := (\ r@(FF)) iff eK, (6.2) 
Hy(F) := f(6"')Hwa(GF) — with BY =) Me(ui): (6.3) 


The aggregation operator /\ may be any t-norm such as A = [[ or A = min. The 
function a : Ky x Ky — Rj measures the distance between two fuzzy sets F and F", 
eg. a(F, F’) := d,(F,F"). We assume here that a(F,F”’) is invariant to permutations 
a(F, F’) =a(PF, PF’). The function h : Rj — [0,1] depends on the distance of F’ to F: 
for small distances h(x) achieves values close to 0 while it rises to 1 for bigger distances 
such as x = a(Fi,, F") with F’ € C,. h(x) is thus a potential which is sensitive for 
nearness to an unambiguous point F” € C,. The aggregation A meg, h(a(F, F’)) gives the 
total potential measured at point F’. 


Considering all F’ € C, in equation (6.2) can be annoying since for k-dimensional K , 
we have k different sets F’ and other difficulties arise in the regions of overlap. Similar to 
the technique applied for the measure of fuzziness we may restrict attention to (= min 
and use 


Hya(F) = min h(dg(F,F’)) = h( nin dy(F,F')) = h(dg(FE")) (6.4) 


with prprear(uj) = 1 if us = argmaxy,pp(uj) and pprear(u;) = 0 otherwise. FY*" € C, is 
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the closest unambiguous corner point to F’. 


The other approach consists in measuring the distance from the least specific set Fix. 
Figures 6.3(a) and (b) give a pictorial impression of this technique. Both figures show 
the regions around the corners of C, where ambiguity should be low. In eq. (6.2) this is 
achieved by explicit aggregation of contributions from all the unambiguous points. Figure 
(b) shows qualitatively the same situation with a much more convenient representation: 
One large region around the center Fy/, of K, is defining the domain of high values of 
non-specificity. 

Hye(F) := h(a, Fijx)). (6.5) 


The function h is high for distances close to zero and falls off for bigger distances. We 
may set for example h(x) = 1 — h(z). 


Setting a(F, F’) := dg(F, F')/dg(Fiyx, F) with F € C, we can assume that the distance 
from any corner set F” to the center F/, is equal to unity irrespective of the dimensions. 
One example of a potential h(x) is given by h(x) := 1+ ho(e~*™ — 1). We have h(0) = 1 
and h(1) = 0 if ho := 1/(1 — e~”). Figure 6.5(a) shows an example for Fap = (11, [12). 


Another possibility is to use h(x) = 1 — -4,2* which gives a measure that obtains 
values between zero and one: 
ke 1\? 
Hye(F) =1= 7-5 S- (seu = z) (6.6) 
i=1 


In complete analogy we can define measures of non-specificity in K,,. We state only 
briefly the expressions: 


Hy(F) := f\ h@l(FF)) if Fe Kn. (6.7) 
F’'ECs 
Hya(F) := f(6"')Hna(6F) with @~* = max [up(Ui). (6.8) 


Using A = min we have again 
Hya(F) := h(de(F, Fr) (6.9) 


with F°*" defined as above. The other approach consists in measuring the distance from 
the least specific set F. = 
An-(F) := h(a(F, F)). (6.10) 


The function h is high for distances close to zero and falls off for bigger distances. We 
may set for example h(x) = 1 — = and obtain a measure that returns values between 
zero and one: 

1 


Hye(F) = 1-7 ) (up(u;) — 1)” (6.11) 
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(0,1 (1,1) (0,1) 


Figure 6.4: The geometry underlying measures of non-specificity. For ease of visibility the case 
of fuzzy sets in two dimensions F2p = (111, 12) is depicted. 


6.2.2 Measuring Non-Specificity with potentials in K,;. 


We can directly define non-specificity on K,. Instead of stating again eq. (6.2) for general 
F € K, let us just note that 


dig (F) = hd Fee )) (6.12) 


can be applied as well for fuzzy sets in Ky. The formulas for Hy, and Hye cannot be 
applied directly because the most specific fuzzy sets in C, are not the sets having highest 
distance from Fy/, or F. 


To define estimates of non-specificity in Ky we no longer measure the distance to 
C, but use distances to other regions of Ky: To the diagonal F,(3) and to the simplex 
K,. Consider figure 6.4(a) which shows the regions around the corners of C, where non- 
specificity should be low. Figure (d) shows qualitatively the same situation : it can be 
obtained by combining figures (b) and (c). In figure (b) all the sets around K, get low 
values of non-specificity. Thus all the points close to the unambiguous sets C, obtain low 
values of non-specificity. However, sets close to the center F1/, of K, also achieve low 
values of non-specificity. This is fixed in figure (c) where all the points close to the diag- 
onal F4(3) are assigned high values of non-specificity. Both contributions taken together 
are depicted in figure (d) where only the regions around unambiguous sets remain. Note 
that the region around Fo is considered highly non-specific with this approach. If this is 
unwanted a third term contributing only around Fo can change the value of non-specificity 
attributed to Fo. We will now work out these qualitative ideas more precisely. 


To this end let us first state how to measure the distances of fuzzy set F’ to the 
simplex K, and to the diagonal Fy. In the following we assume the distance to be 


given by do(F, F’) = 35; (ti — wi)? We define the unit vector along Fa through pr, := 
(1, ..., 1) and the vector corresponding to the center set Fi, by pw, := ¢(1,...,1). 


k 
Now the signed distance 6(F, K,) of F' to K, is given by 


5(F,K,) = a! «(up —m,) = = (= pr(ui) — ‘ | (6.13) 
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(a) Hna(Fop) (b) Hne(Fop) (c) Hne(F2p) 


Figure 6.5: Examples of the new measures of ambiguity for Fyp = (11, [2). 


Note that 6(F', K,) is negative for F' = Fo and positive for F = Fy. The sign of 6(F, K,) 
expresses on which side relative to the simplex K, the point F is located. The distance 
of F' from the diagonal Fy can be expressed by 


eee aad ox ee wr? (614 


with fip(ui) = 7 >0;4r(u). Thus 6(F, Fy) measures the deviation from the average as 
it is proportional to the statistical variance of the membership values pr(u;). Defining 
a(F, K,) := >0,; wr(ui) — 1 we can now write down the measure of non-specificity 


Hy-(F) =1- hyi(a(F, K,)) A ha (d(F, Fa) (6.15) 


The operation a \ 6 may be min(a,b) or a- b. hy(x) is a potential that is 1 for x close 
to 0 and drops to zero for greater distances. h2(x) is a potential which is 0 for x close 
to 0 and rises to 1 for greater distances. The above definition corresponds to the state- 
ment: To display low ambiguity F’ should be nearly sum-normal and the variance of its 
membership values should not be too low. Measures defined through equation (6.15) will 
usually fulfill the requirements of minimality, maximality and symmetry but violate the 
scaling property. 


Figure 6.5 shows examples of the new measures of ambiguity. In figure 6.5(b) we 
have Hy, with a \b = a-b and hy(r) = e~™™ , ho(x) = 1 — e~*” while in fig. 6.5(c) 
a /\ b = min(a,b). Note that Fo is no longer a point of minimal ambiguity. If this is 
wanted another term can be added that enforces minimal ambiguity for Fo. 


6.3 Summary and Discussion 


In this chapter we have presented in detail many existing measures of non-specificity. We 
have also defined various new measures according to the ideas introduced in chapter 4. 
Some of these measures will be applied during the action planning phase of our active 
recognition algorithm which will be described in more detail in the chapters 8 and 9. For 
example, Shannon Entropy will be used to measure non-specificity for the probabilistic 
formulation of our algorithm and Kaufmann entropy will be applied in both the pos- 
sibilistic and the evidence theoretic implementation. We will also compare Kaufmann 
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entropy, Lamata and Moral’s measure and the new measure defined through eq. (6.11) in 
an extensive set of experiments. 


But this chapter has spanned a much wider range of possible cases than we will consider 
in the following. We have reviewed and introduced new measures for distributions of con- 
fidence values which are subject to different constraints: sum normal distributions which 
are most prominently found in probabilistic models, max normal distributions which are 
characteristic for possibilistic approaches and distributions without any global constraint 
(the only local constraint being that the confidences range from 0..1) which arise when 
using the most general fuzzy aggregation operations. No matter what kind of formalism 
is chosen, the analysis given in this chapter can provide a reference for the researcher 
who is in need of measuring non-specificity. Besides active fusion there is a wide range 
of problems in which this need arises. Whenever one option has to be selected from a 
set of possible alternatives, measures of non-specificity can provide an estimate of how 
unambiguous the choice will be. 


We have stated particular features of various measures such (as differentiability) which 
can help in deciding what measure to choose under which circumstances. But can we do 
better than this ? Can we provide a detailed prescription which measure will perform 
best, given certain circumstances ? Unfortunately we have to answer this question in the 
negative. The situation encountered with different measures of non-specificity is similar 
to the situation described by Bloch for different fusion operators [24]: We can give gen- 
eral guidelines, but once all necessary constraints have been considered, the choice among 
the remaining possible measures of non-specificity remains a matter of trial and error. 
There is no “easy” route to find the best measure of non-specificity. For example, we 
cannot predict whether the newly defined measures will outperform existing ones, and if 
so, under which conditions this will happen”. The given presentation can only serve to 
help researchers to find some good measures without too many efforts. The introduced 
new measures enhance the arsenal of available tools for solving the task of quantifying 
non-specificity. 


We conclude the discussion of measures of non-specificity by noting that the above 
comments will apply only partially for measures of numerical imprecision. We will see in 
the following chapter that some measures such as standard deviation display unwanted 
properties under certain circumstances. It will be possible to define clearly the condi- 
tions under which it can be advantageous to apply alternative estimates of numerical 
imprecision. 


?That is exactly the reason why we will try to answer some of these questions through experiments 


Chapter 7 


Measuring Numerical Imprecision. 


It has been mentioned above that the requirement of symmetry for measures of ambiguity 
might lead to counter-intuitive results. To cite Pearl [98](p.322): “The main weakness 
of Shannon’s measure is that it does not reflect ordering or scale information relative 
to the values that a variable may take.” Since all measures considered so far share the 
property of symmetry under permutation the above citation applies equally well to the 
other measures of imprecision we have reviewed and defined. 


In order to motivate the following discussion let us shortly review how numerical im- 
precision is usually measured in scientific experiments. In the vast majority of possible 
cases the “perfect” outcome of such experiments would consist of a single very precise 
value (of temperature t for example). Most often it is assumed that various statistically 
uncorrelated random errors occur in a set of experiments, leading to a gaussian distri- 
bution p(t;) of the measured values t;. The width of that gaussian distribution is then 
estimated by its variance: 


(0) = Spt) (8? = 5 oo talts) ( — t)) (7) 


with ¢ = )7, p(tj)t; and assuming )7,p(ti) = 1. The right side of the above equation 
corresponds to the most commonly used square error cost criterion in decision analysis 
[98] and serves to motivate the following definition. We suggest to base estimates of 
numerical imprecision on the cost associated with a certain distribution of membership 
values: 


(F):= yes ve jip(us)fip(u;) cost(u,, u;). (7.2) 


The function cost : U x U — R measures the average actual cost (in terms of damage 
caused) if we assume one of the singletons u € {uj,u,;}, e.g. wu; to be the “right” value (for 
example, the correct pose estimate) while the other one u,; is the actual correct choice. 
We compare “ideal models” which are given by u; and u;. The above definition implies 
cost(uj, Uj) = cost(u;,u;) without loss of generality because non-symmetric costs will be 
symmetrized by eq. (7.2). jir(uz) is defined through one of the following expressions: 
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1. fip(ui) = r(u;). Then G2(F) measures the total total cost that occurs if a value 
u; is taken mistakenly instead of a value u,; weighted with the confidences that 
singletons u; and u, are selected at all. 


2. fir(ui) = wr(u)/ >>, uwr(u). Now o2(F’) measures the average cost to be expected 
when singletons u; and u,; are selected with probability proportional to p(u;) and 
Ur(u;). To avoid difficulties we define fip(u;) := 0 if up(u;) = 0. 


3. fir(u;) := Up(u;)/ max; wp(u;). Then o?,(F’) measures the possible cost to be ex- 
pected when we are obliged to take one of the singletons u; (no matter how small 
the actual confidences may be) and the possibility for taking u; is equal to fip(u;). 
We define fir(u;) := 0 if ur(tmax) = 0. 


In case of cost(ui,uj) = 4(ui—u;)? we obtain 6?(F) = (3°; fi)? 02. 07, is the 
“variance” of the distribution of values u; defined through 07, := u? — a with a, := 
(>, fiti)/(Q2, i). For sum-normal distributions oy corresponds to the statistical vari- 
ance and o?(F) corresponds exactly to eq. (7.1). Other cost functions which are increas- 
ing with distance in a different way can be envisaged as well. Whenever cost(u;,u;) = 
g(u; — uj) the double sum in G?(F) can be reduced to a single sum for the Fourier- 
transformed quantities because of translation invariance. 


Another immediate generalization lies in the application of linguistic modifiers which 
can also break the symmetry in the cost-function: 


e(F)=s (= dover (fir(ui)) verys(fie(uy)) cnt) (7.3) 


4 Jj 


s:R— R is a monotonically increasing function. The linguistic hedges very; and very 
could be realized by functions like very;(x) = x’. If p; # p; the symmetry is broken. The 
above formula measures the weighted (1), average (2) or possible (3) total cost that occurs 
if any value u,; with very high membership fip(u,) is mistakenly taken instead of u; with 
very high membership fip(u;). The measuring scale will be non-linear if s(x) is non-linear. 


o°(F) and &?(F) fulfill some scaling properties which are listed in table 7.1. We can 
see that for fuzzy sets GF with small membership values (3 < 1) 6?(GF) will be low. 
This effect can be unwanted since it couples the height h(F’) to the amount of ambiguity 
present in F’ and there is no a-priori reason for introducing such a correlation into a sys- 
tem measuring numerical imprecisison. 


Exemplary results obtained with the new measures are depicted in Fig. 7.1. Figures 
7.1(a) and (b) depict 2(F) and g?,(F) in the case of a 3-dimensional universe U = 
{t1, U2, U3} of distance values. We have chosen the “natural” cost term 


1 
Cost L;) = 7 (u; — u;)? 


which directly relates our measures of numerical imprecision to the variance as defined 
through eq. (7.1). The distance values have been chosen to be uj = 0, ug = 15 and 
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[ip (us) Scaling properties. 
[op (Ua) 6, (BF) = (oF) | (Br) = errs (er). © 
tr(ui)/ ibe) [o(OF)=o3(F) | 83(6F) = 35(F). 
tir(ui)/ max; up(ui) | o(GF) =on(F) | sn(GF) = si, (F)- 


() Assuming that s(x) = az", very;(x) = 2. 


g 


Table 7.1: Scaling properties. 


(1,1,1) 


(0,1,1) 
(1,0,1) 


(0,1,0) 
(1,0,0) 


(0,0,0) 


(a) 63 (F3p) (b) &2,(F3p) (c) Axes 


Figure 7.1: Measures of fuzzy ambiguity taking into account the cost through the euclidean 
distance of the events uj = 0,u2 = 15,u3 = 20. The slices are depicted for fuzzy 
sets F3p = (1, [2, [3). 


uz = 20 (measured e.g. in cm) to underline the effect of the asymmetry in the figurest. 
Both figures 7.1(a) and (b) depict slices cut through K; with brighter values indicating 
higher ambiguity. It can be seen that we no longer have a symmetric distribution of 
ambiguity values. Instead in both figures the set F' = (1,0,1) shows higher ambiguity 
than F = (0,1,1) as we expect since the distance between u; and uz is much bigger 
than the distance between wz and u3. It can also be seen in figure 7.1(a), that for o?(F) 
ambiguity falls of quickly as membership values are lowered. This effect is eliminated in 
fig. 7.1 (b). 


7.1. Numerical Imprecision for Cyclic Variables 


In our experiments we are rotating the objects around the z-axis. Thus, we are not only 
interested in measuring the numerical imprecision of variables such as “length” but also 
want to know how to quantify the numerical imprecision of angular variables. The above 
discussion is equally valid for cyclic variables such as angles as long as one does not 
specify the cost term. We have seen that cost(u;,u;) = (u; — uj)? is a useful cost term 
for non-periodic variables. However, this “natural” cost term is clearly inappropriate for 
angular variables. For example, if uj = 350° and u; = 10° then we should actually take 
the minimum difference 20° and not 340° in the cost term. The cost should thus be a 


'The same figures would be obtained for a uniform universe ranging from up = 0, ui = 1,.., w2o = 20 
with all membership values equal to zero apart from jir(0), fir(15) and jip(20). Such a situation can 
arise for example because the pose estimation module only considers these three values to be possible. 
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Figure 7.2: The used cost function cost(Ay) as defined through eq. (7.4). 


function of Ay := u; — u; defined as follows (measuring angles in radian from now on) 


Ming (Ay+27) = mino,(Ay) 


saisstes { Ay : Age [0,7] 


2n—-Ay : Ag é |r, 2a] 


From the above definition we have minz,(—Ay) = minz,(Ay). We will use 
2 
cost(u; — uj) = = (minog(u; — u;))*. (7.4) 


This cost function is depicted in figure 7.2. The factor 2/7? has been chosen to ensure that 
the maximum value of numerical imprecision is 1 when using sum-normal distributions of 
confidence values. 


7.2. The Problem with the Variance 


In this section we will discuss why using the variance can be counter-productive for active 
vision systems that model uncertainty by sum-normal distributions of confidence values 
(such as e.g. probabilities). 


To this end we first define the type of distribution which should be regarded to be 
maximally imprecise by a measure of numerical imprecision. For active vision purposes 
the answer is clear: the system is unable to extract any useful information from the im- 
ages if all poses remain equally likely. Thus a uniform distribution of confidence values 
should be regarded to be most imprecise. Other imprecise configurations exist. For ex- 
ample if the system assigns equal confidences to two maximally separated hypotheses the 
value of numerical imprecision should certainly be high. However, from the perspective 
of active fusion such a situation can quite easily be resolved by investigating one of the 
two competing hypotheses in more detail. Also, for certain objects one can never decide 
on one specific pose. We will encounter a mirror symmetric object below for which the 
best conceivable pose estimation results in two maximally separated, equal confidence 
values. Thus, such an output should therefore not be regarded to be the most imprecise 
configuration. 
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The problem with the variance is essentially this: the variance is not maximal for 
the uniform distribution but for distributions with two competing hypotheses at maximal 
distance. This can be understood easily if one notes the formal equivalence of eq. (7.1) 
with the equation defining the moment of inertia in classical mechanics. Rods with uni- 
form mass distributions have a lower moment of inertia than objects with two masses at 
maximum distance from the center. 


The same discussion applies also to cyclic variables if the above cost term is used’. 
For sum-normal confidence distributions the constraint $7, c(u;) = 1 makes analytically 
estimating the upper bound of numerical imprecision a non-trivial task. It has been found 
numerically? that the maximum is obtained when the “masses” are at maximum distance 
from each other, i.e. Hi, j such that c(u;) = c(u;) = 0.5 and Ju; — u;| = (2n + 1)m with 
n € No. This is in analogy to non-cyclic coordinates where one finds that the maximum 
variance is obtained if the total “mass” is equally distributed at the two endpoints of the 
considered interval. As the resolution in pose becomes finer (i.e. the number of possible 
pose values n, gets bigger) we can distribute the confidence “mass” in two opposite poses 
with increasingly higher degrees of accuracy. From ng > 10 onwards numerical impreci- 
sion has always this same upper bound‘. 


For active vision applications we therefore want to use another measure of numerical 
imprecision when using sum-normal confidences (e.g. probabilities). We would like to 
have a measure that is 


e non-invariant to permutations because it has to take into account the physical dis- 
tance between different hypotheses, 


e maximal for uniform distributions of confidence values. 


We have seen that measures of non-specificity are maximal for uniform distributions of 
confidence values but invariant to permutations. An appropriate combination of both 


e a quantity like o? 
e and a measure of non-specificity (like Shannon entropy) 
may display both properties. It can indeed be checked that 


Hy « Ho”, (7.5) 


?Measures of imprecision defined through eq. (7.4) are closely related to variance because eq. (7.4) is 
the only natural generalization of (wu; — u;)? to periodic variables. 

3We have used a technique borrowed from relaxation labeling for that purpose. The task is to maximize 
o*(c) = D0, ; cicjcost(uj — uj) subject to the constraints }7; ¢; = 1 and ¢; € [0,1], with c; denoting c(u;). 
A local maximum can be found using the Baum-Eagon theorem [11, 12, 99]: c?*! o c#A0?(c*)/dc¥ with 

er = 1. In order to estimate the global maximum we have used substantially different random 
initializations in about 10° test-runs with convergence tested by er —c*| < 10~4 for each Ny = 3..200. 

‘This fact indicates that discretization effects are important only for Np < 10. For example with 
N » = 3 numerical imprecision becomes maximal for the uniform distribution. In our experiments the 
maximum is always reached for hypotheses with maximum angular separation since we use ny = 72 which 
is well above 10. 
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with H denoting Shannon entropy fulfills both of the above mentioned properties. It is 
low for specific and numerically precise distributions and maximal for uniform distribu- 
tions (which are also numerically imprecise). 


Another possible alternative is the use of 67, which achieves its maximum for the 
uniform distribution. All three measures will be applied in the following experiments (see 
eq.s (8.11),(8.12) and (8.13) and the experimental results in section 9.7). 


7.3 Summary and Discussion 


In this chapter we have introduced measures which quantify the subjective impression of 
the accuracy of physical measurements. We have based the measures on a double footing: 
first through motivating their introduction by their formal similarity to the expression 
defining the variance, and second through an interpretation of their meaning in terms of 
possibly occurring costs. 


In view of the application we have in mind the presentation has included both pe- 
riodic and non-periodic physical variables. For periodic variables all quantities have to 
be measured modulo the length of one period. This modulo operation necessitates some 
modifications disregarding the kind of uncertainty calculus used. In the probabilistic case 
the right hand side of eq. (7.1) is no longer equivalent to the left hand side if the cost is 
measured through eq. (7.4). We therefore consider the definition on the right hand side 
of eq. (7.1) to be the proper generalization to periodic variables. 


We have noted that the variance as defined through eq. (7.1) (or with the alternative 
cost function eq. (7.4)) does not assume its maximum value for the uniform distribution 
but for the distribution with competing hypotheses at maximum distance. This feature 
is unwanted for active planning purposes. Especially if the object database contains sym- 
metric objects the active planning module will fail to work properly if the variance is used 
for measuring numerical imprecision in the probabilistic implementation. 


We have therefore suggested to use two other measures of numerical imprecision: H,2, 
a combination of Shannon entropy and variance, and ¢,,, the variance with all confidences 
transformed to a max-normal distribution. Both measures assume their maximum value 
for the uniform distribution. We will compare the three measures of numerical impreci- 
sion in chapter 9. 


We have not considered the case of continuous distributions. This is in fact an omis- 
sion since most physical quantities (such as angles, distances etc) are actually continuous 
variables. However, we do not expect any significant changes to be necessary to extend 
the given definitions to continuous variables. In actual applications we will always deal 
with discretized values for the physical quantities under consideration. 


The formalism should also be extended to multi-dimensional variables for systems with 
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more than one degree of freedom. Again in most cases replacing the cost term (wu; — u;)? 
by |lu; — u,||? (with ||x|| denoting the length of vector x) should suffice to obtain the 
corresponding expressions for vectorial quantities uj, u,. 


We are now ready to formulate the missing details of the active fusion algorithm. In 
the following chapter 8 we will present various concrete implementations using different 
measures of imprecision. Experimental results obtained with these implementations will 
be reported in chapter 9. 
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Chapter 8 


Case Studies in Active Fusion for 
Object Recognition 


Here we continue our discussion on active object recognition. This chapter serves to 
introduce the details of the different implementations of the active fusion algorithm. The 
following chapter is devoted to experimental studies. The reader who has taken the left 
route in Fig. 1.3 and has made himself acquainted with the most prominent measures of 
imprecision may quickly review our goals as stated at the end of chapter 3: 


e We will study different implementations of the general active vision algorithm in 
order to find out whether some uncertainty calculi display specific advantages over 
others when applied to active fusion tasks and to evaluate the potential of different 
fusion schemes in active object recognition. 


e Secondly, we will evaluate experimentally different measures of non-specificity and 
numerical imprecision. For some theories the measures of choice are quite obvious 
(e.g. Shannon entropy in probability theory). For other uncertainty calculi a bigger 
pool of possible measures exists. As stated in chapter 3 it is not clear a-priori 
whether any of them is particularly suited for our purposes. We will evaluate the 
experimental impact of using different measures of non-specificity. Furthermore 
we have already stated that common measures of numerical imprecision (such as 
standard deviation in probability theory) display properties which are not wanted 
in active object recognition. We will perform experimental runs to back up the 
claim that some of the newly introduced measures of numerical imprecision lead to 
a more reasonable behavior of the overall system. 


In the following we will describe a probabilistic, a “standard” possibilistic, a “stan- 
dard” (minimal) evidence-theoretic approach, and various fusion schemes for fuzzy logic. 
All of the presented formulations will follow a common frame-work, the base of which has 
been established in chapter 3. Consequently there will be strong similarities between the 
algorithms as the different uncertainty calculi will be used to implement the same model. 


In particular, it will always be assumed that we have an object recognition system 
that returns scalar confidences for class and pose estimations. This is not necessarily the 
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case if the object recognition system is built using for example possibilistic or evidence 
theoretic reasoning. In the possibilistic case we may obtain confidence intervals from the 
recognition system while in the evidence theoretic case we may obtain confidence values 
(masses) also for sets consisting of multiple object-pose hypotheses. Thus neither the 
possibilistic nor the evidence theoretic formulation given below will exploit all features 
of possibility theory or evidence theory. Instead we have adopted a strategy of minimal 
differences between the different implementations in order to study the overall effects of 
these localized small changes within the general algorithm presented in chapter 3. Thus 
the main differences of the different implementations will lie in 


e the initialization of the confidences (probabilities, possibilities etc.) for the action 
planning module from the classification results, 


e the fusion of uncertainty values. (probabilistic product, possibilistic min, fuzzy 
aggregation, Dempster Shafer’s orthogonal sum), 


e the measures of non-specificity (and numerical imprecision) used to evaluate the 
utility of a specific action. 


Since we will subsequently also apply the different implementations to the same actual 
problems we may be tempted to compare the results to identify the implementation (or 
boldly speaking even the uncertainty calculus) that is “best suited” for active recognition. 
Of course such a conclusion is not without its problems. Various issues can never be set- 
tled satisfactorily when comparing different uncertainty calculi through making concrete 
experiments. Even in our restricted case we face at least three major problems: 


e We have to decide on how to initialize the reasoning schemes from the output of the 
object recognition module. This is especially critical in the case of evidence theory 
because it is in the assignment of masses where one has the greatest freedom in this 
theory. Since the object recognition module will always be based upon the learned 
frequencies (probabilities) of feature values g we face the problem of how to design 
the interface between modules relying on different uncertainty calculi. This issue 
is subject of ongoing research. Thus the proposed initialization schemes cannot be 
derived from (not yet existing) fundamental theories but are rather motivated by 
heuristic arguments. 


e We have to decide on a specific implementation of fusion operators. Probabilistic 
fusion usually relies on (more or less well justified) independence assumptions and 
our implementation below will be no exception. In evidence theory Dempster’s rule 
of combination settles the issue. For possibility theory and fuzzy logic there are lots 
of fuzzy aggregation operators available. But apart from some general guidelines [24] 
the proper choice of one of these operators still remains a matter of trial and error. 
Hence, our results have to be interpreted with caution. We can never exclude the 
possibility that another not yet chosen aggregation operator could perform better. 


e The same comment applies to measures of non-specificity and numerical imprecision. 
In probability theory usually Shannon entropy and some sort of standard deviation 
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are applied. In possibility theory, fuzzy logic and evidence theory some freedom of 
choice exists'. Again we have no theory that could tell us beforehand if there is any 
measure which gives the best recognition performance with the minimum number 
of observations. 


The mentioned difficulties may be summarized as follows: Even adopting a strategy of 
minimal differences between different implementations the freedom in design is enormous 
and it may be possible that an implementation which would be more successful has been 
“overlooked”. In addition, the question which initialization, fusion and planning schemes 
give best results may receive different answers in different environments. When doing 
real experiments we have to decide on a specific model database and a specific object 
recognition strategy. It is not necessarily the case that the results to be reported below 
can be generalized to other settings. 


Nevertheless, we will be able to establish some general conclusions from our study: 


e Even though we make only reduced use of specific uncertainty calculi all of the im- 
plementations are successful in the sense that they are able to solve the problem of 
guiding active recognition. For the special eigenspace representation of the chosen 
object models and the applied object-pose classifier we can of course compare per- 
formances. The results of this comparison will be discussed later on in chapter 9. 
Even though this comparison is subject to the mentioned caveats it may be helpful 
to have such results in as many settings as possible. The presented experiments 
constitute one step toward this direction. 


e It will be possible to establish some necessary properties of successful fusion oper- 
ators. In particular, we will see that the conjunctive aggregation of confidence is 
best suited for active fusion purposes, while a disjunctive fusion scheme is clearly 
inappropriate. 


e Conjunctive schemes will turn out to be comparatively sensitive to outliers. We 
will therefore also make experiments with averaging fusion schemes, showing that 
strong resistance against noise and outliers can be obtained by the use of such 
compromising aggregation operators. There is a price to pay for this: active fusion 
will become less effective. We will demonstrate this and discuss the issue in more 
detail below. 


8.1 Probabilistic Active Fusion. 


Let us start our discussion of different implementations with the probabilistic formulation. 
In our experiments object recognition will always be based upon the learned likelihoods 
in eigenspace. Hence the probabilistic implementation must give optimal results if the 
likelihoods are accurate and the independence assumptions which will be necessary to set 


'The case of evidence theory appears to be even more involved since it looks like measures of numerical 
imprecision of mass assignments have not even been established so far in the literature. 
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up the probabilistic model are fulfilled. As mentioned in chapter 3 we have restricted 
motions to rotations of the turn-table for the purpose of demonstrating the principles of 
active fusion. We therefore have Ta,~, = AY+y, with y,, denoting the angular position 
of the turn-table and Aw standing for the angular rotation. It is mainly a matter of more 
complicated notation to write down the formulas in case of more degrees of freedom [27]. 


8.1.1 Probabilistic Object and Pose Classification 


We use probabilities P(o;|J,) and P(¢%;|o0;,[;,) to measure the confidence in object and 
pose hypotheses. From the learned likelihoods p(g,,|0;,(2;) we obtain with the rule of 
conditional probabilities 


P(O:, Gila) = PCO: gn) = PBaMa EADS HOP (81) 


In our experiments P(o;) will be uniformly distributed as the system does not know 
which object will be placed on the table. We will also use uniform P(~,|0;) because we 
consider only rotations around the z-axis and the system does not know which pose the 
human operator has chosen for the object under consideration. The assumption of uni- 
form P(~,;|0;) is not true in general since every rigid object has only a certain number 
of stable initial poses. If, however, the parameter ~,; is chosen such that it does not 
include unstable poses” then there is no need to consider non-uniform P(%;|o;) as long 
as every stable initial pose is equally likely to arise (and the system does not have any 
additional knowledge on what stable initial poses will be chosen by the external operator). 


Given the vector g, in eigenspace the conditional probability for seeing object 0; is 


P(oi\In) = P(oilgn) = De Po: P3180): (8.2) 


The pose is estimated through 


P(0i, Pin) 


P(@,\o,1,) = 8.3 
We transform this estimate to the physical pose estimate using 
P(y3j|0i, In) = Pn + 950i, In) (8.4) 


8.1.2 Probabilistic Product Fusion 


The currently obtained probabilities P(o;|J,) and P(y;|o;,[;,) for object hypothesis 0; 
and pose hypothesis y; are used to update the overall probabilities P(o,|4),..,[,) = 


?This is the case for rotations around the z-axis. Actually, as a rule this can also occur for more 
degrees of freedom because the object will be placed on the turntable at stable initial poses during the 
learning phase. Thus there will usually be no learning data for unstable configurations. 
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P(o;|81,--, Sn) and P(y;|oi;,h,.., Jn) = P(y;|0:, 81, --,8n). In order to be able to pro- 
ceed we assume the outcome of individual observations to be conditionally independent. 
We then obtain 


P(o4|In) 
Polly, .., In Pio lh, .., [n-1) ——— 8.5 
(oii ) x Plolh 1) Pos) (8.5) 
P(y3;l0i, In) 
P(y;|0;, 4, ;Jn) « P(¢,|0:,h,...J.-1) 2. (8.6) 
anata) dass P(ejlo4) 
P Op Go) Tice Le) = PO; | OBL i fee lll Or iy a ba) (8.7) 


The above assumption of conditional independence leads to an associative and com- 
mutative fusion scheme, whose properties will be discussed in section 8.5. 


8.1.3. Probabilistic View Planning 


View planning consists in attributing a score s,(Aw) to each possible movement Aw of 
the camera. The movement obtaining highest score will be selected next. (See eq. 3.14). 
The score measures the utility of action Aw, taking into account the expected increase 
in quality of the results. We measure increases in quality by estimating the reduction 
in ambiguity for the object hypotheses. In the probabilistic framework we measure non- 
specificity using Shannon entropy 


Holgi, 8n) = — >> Ploilgi, -- Bn) log P(oilg1, --: Bn): (8.8) 
The score is given by the average expected entropy reduction 
sn(Ad) := > P(oi, gil Li, «., In) AH (Ado, 95, Li, --s In) (8.9) 
018 
The term AH measures the entropy loss to be expected if 0;,; were the correct object 


and pose hypotheses and step Aw is performed. 


The expected entropy loss is again an average quantity given by 
AH (Ay |o;, Pj ih, oy i = 


H(olgi, --, Sn) — [ vel. py + Un + Ay) A (olgi, -., Sn, g)dg (8.10) 


In the experiments in chapter 9 we will test also the following measures of numerical 
imprecision for the pose variable: 


o*(yloi, S15-5 En) = S° S- P(Y,|0%, S15- En) P(Ys|oi, S15-+5 En)cost(y, = Ys) (8.11) 
Yr Ys 
with cost(y, — y,) being proportional to the square of the minimum difference between 


yr and y, (see also eq. (7.4)). The proportionality factor is chosen such that 


0 at a” (loi, B1, rs) = 1. 
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The above formula is the direct translation of the square of standard deviation to a cyclic 
variable. We have already noticed the annoying fact that o? is not maximal for uniform 
distributions. Hence we will also use the following measure of numerical imprecision 


A2(p|o;, 81, --,8n) = a A(yloi, gi, --, Bn) o* (voi, £1, 5 Zn) (8.12) 


i.e. the product of Shannon entropy and the variance. In chapter 7 we have shown that the 
first factor lets H,2 assume its maximum value for uniform distributions while the second 
factor makes sure that H,2 is not invariant under permutations of the arguments y,;. The 
proportionality factor a := 1/logn, is chosen such that 0 < H,2(y|o;, 81, -., Sn) < 1. The 
third measure is given by 


&*(yloi, S1,-5 En) - Ss" S- P(y,y|0%, Sis. gn) P(ysloi, S15-5 gn )cost (pr = Ys) (8.13) 
Pr Ys 
with : 
POs Oi 85 i 85) > POs On Bt. 1 Bn)/ max P( pros, 21,--58n)- 


8.2 Possibilistic Active Fusion. 


Here we describe a possibilistic alternative to the probabilistic approach of guiding ac- 
tive fusion for object recognition. Possibility theory as introduced by Zadeh [139] is based 
upon fuzzy logic [138]. For an exposition of possibility theory including more recent devel- 
opments see [46]. The discussion most relevant for the current context can be found in [48]. 


The main properties of general possibility measures I] : Q + [0,1] and necessity 
measures NV : (+ [0,1] are 


VA,B,  TI(AUB) 
VA,B, N(ANB) 


max(II(A), II(B)) 
min(N(A), N(B)) 


with A, B denoting subsets of a reference 2 (see [46]). Q is the set of possible singleton 
events, while the sets A, B are interpreted as possible (compound) events. When the set 
(Q is finite, every possible measure II can be defined in terms of its values on the singletons 
of 22: 

VA, II(A) = sup{a(w)|w € A} (8.14) 


where 7(w) = II({w}). a is a mapping of 2 into [0,1] called a possibility distribution. It 
is normalized in the sense that 


dw € A, n(@ ) 1 


since II(Q) = 1. We will mainly deal with singleton events of the form w = (0;,y) and 
T({w}|Z,) = IL(0;, y|I,) in the following?. Consequently we will not make explicit use of 
necessity measures. 


3We shall also use II(o;|J;,) to denote the possibility for a particular object hypothesis to simplify 
notation while keeping it self-evident. Consistency will be assured in the following because eq. (8.14) 
and eq. (8.16) are consistent. 
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8.2.1 Possibilistic Object and Pose Classification 


In the possibilistic formulation II(o0,|J,,) stands for the possibility of the object hypothesis 
o; given input image [,,. I(,;|o0;, [,) shall denote the possibility of the pose estimation 
~; given true object 0; and input image J,,, while II(o0;,¢,;|J,) is the possibility of the 
compound hypothesis (0;,@;). We estimate II(o;, ;|I,) from the learned likelihoods 


IN(0;, Pj|Ln) « P(0i, PjlIn) & P(Enloi, ¥;) (8.15) 


with max,,., 1(0;,4;|In) = 1. Doing so we interpret the likelihoods in feature space as 
possibility distributions. This can be justified if we regard the process of learning the 
likelihoods from samples to be an exploration of possible locations in feature space for a 
certain object pose hypothesis. Using the above initialization we also ensure that both the 
probabilistic and the possibilistic implementations are built upon classifiers with similar 
performance. For feature vectors very far away from the center of the closest likelihood 
distribution the above initialization regards all object. hypotheses to be fully possible’. 


The possibility II(o;|/,) for object hypothesis 0; is given by the projection 
II(o;|In) = max II(0;, ;|In). (8.16) 
oy 


J 


Following [63] we demand 
II(0;, Oj |In) = I(D;]0i, In) A H(0i|Ln) (8.17) 


with A being any t-norm. We have used A = min and the least specific solution of eq. 
(8.17) which in our context is given by (see [48],eq.36): 


A = 1: if IIl(0;, P|In) = I1(0;|Ln) 
M(Bjl0%, In) = { II(0;,;|In) : otherwise. (as) 
Again we obtain the pose estimate in a standard coordinate system through 
IN(p;|0i, In) = IG; + Yn|oi, In). (8.19) 


8.2.2 Possibilistic (Minimum) Fusion 


In complete analogy to the probabilistic case the currently obtained possibilities I(o;|I,) 
and II(y;|0;,[,) for object hypothesis 0; and pose hypothesis y; are used to update the 
overall possibilities H(o,|h, .., [,) and H(y;|o;, h, .., [,). Various possibilistic fusion oper- 
ators are at our disposal. Similar to the probabilistic product approach (“and”) we have 
used the standard conjunctive operator = min (see [48],eq.(49)): 


II(o;| fi, .., In) x Il(o;|fi, .., In—1) A I1(o;|Ln). (8.20) 
IMGs lOp Tinea de) x Il(p5|05; 215255 Ina) A Tas | O;5 Ty) (8.21) 
WG gy LE teecs abe) = TL Gy Ol ts cuathen) ATI (o;| hi, .., In). (8.22) 


4This is true if all likelihoods are set to constant values at distances bigger than some standard 
deviations from their centers which has been done in our experiments. 
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8.2.3 Possibilistic View Planning 


View planning with possibilities proceeds along the same lines as view planning for prob- 
abilistic schemes. We optimize a score using eq. (3.14) and calculate the score through a 
(non-normalized) weighted average ° 


Sn(Ap) = S 7 M(o4, pil Lis In) AH (Ad |oi, 9, L1, «+s In) (8.23) 
Oi, Pj 
We will perform extensive tests using three measures of ambiguity for possibility distribu- 
tions : First we use the simplest index of imprecision existing for possibility distributions 
which is just given by the cardinality of the distribution (see Lamata and Moral’s measure 
in section 6.1.2): 


A(olgi, --)8n) = log(} > IT(o;|81, --, Sn) (8.24) 


The second measure we use is Kaufmann entropy [66], a slight generalization of Shannon 
entropy for distributions not summing up to 1 


1 
H(olgi, --, Sn) := -3 S— W(oilgi, .-; Zn) log H(o;|g1, -., nr) + log S. (8.25) 


with S := }7,, II(o;|gi, -., Sn). The third index of imprecision is one of the newly derived 


geometric measures (see eq.(6.11)) 


il 
No 1 


H (ole, .-,8n) = 1— S © (M(oilga, Bn) — 1)? (8.26) 


O% 


The entropy loss is again calculated using eq. (8.10). 


8.3. Fuzzy Active Fusion. 


Having established a possibilistic implementation let us also present other fuzzy fusion 
operators to be applied in subsequent experiments. Since the first discussion of the active 
fusion algorithm in chapter 3 contains most of the elements of the fuzzy approach we will 
only state briefly the missing details. 


8.3.1 Fuzzy Object and Pose Classification 


We remind the reader that in the fuzzy formulation c(o;|J,,) stands for the confidence 
of the object hypothesis 0; given input image [,,. c(~;|0;,[,) denotes the confidence in 
the pose estimation ~,; given true object 0; and input image I[,,, while c(0;,;|I,) is the 
confidence in the compound hypothesis (0;,;). We will always directly estimate the 
confidences c(o;, ~;|Ln) from the output of the probabilistic classifier 


c(0i, $j|In) = P(0i, 6; |In), (8.27) 


°Note that we do not need to normalize the weighted average because we are interested only in the 
position where the score obtains its maximum value. 
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thereby obtaining sum-normal fuzzy confidence values. This choice is motivated by the 
finding that the probabilistic approach performs best (as will be detailed in chapter 9 
when presenting the results of experimental runs for low outlier rates.). We are therefore 
particularly interested in the effects of introducing fuzzy aggregation operators into a 
system that follows most closely the probabilistic formulation. 


8.3.2 Fuzzy Fusion Operators 


We follow I. Bloch in distinguishing between conjunctive, averaging and disjunctive fusion 
schemes [24]. Denoting the fusion operator by F’ and two confidences values by c, and cy 
we can state the following definitions: 


e Conjunctive Fusion: F'(ca,cp) < min(Ca, Cp). 
e Averaging Fusion : min(Ca, Cy) < F(a, cy) < max(Ca, Cp). 
e Disjunctive Fusion: max(Ca, cy) < F'(Ca, C)- 


In our experimental study we will apply fusion operators of each category. Thereby we will 
be able to draw conclusions about which type of fusion operators are best suited for active 
vision purposes. In particular, the following operators will be studied experimentally: 


Conjunctive Fusion 


We will use two conjunctive fusion schemes, the first one being 


ClO ltg yey = AyleOj Piya tea )y Cor Ty), (8.28) 
CPOs, Ligeh, Tah Ss Bs C505, Lipaisel a), Cl) Oy, 14) (8.29) 
with 
PG, Ol =a Oy 


For w = 1 this approach reduces to the naive Bayesian fusion scheme. For w = 0 the 
newly obtained object-pose estimation determines the outcome completely. Note also 
that. the seemingly more general case ab”? = (a”/"2b)"”2 leads to the same recognition 
performance as ab with w = w1/wa. 


The second scheme we will apply appears more complicated and has been suggested 
by Schweizer and Sklar [119] 


C0; | Et; 223-L4) = F[c(0;|L1, .-, In—1), €(0:|Ln)]. (8.30) 
c(y;|0:, Li, .-, In) 7 Fs |Cl Oy Oxi is 26, Us Clg | Ope tas) (8.31) 
with 
b 
Fa 0) = S 


(as + bs — asbs)1/s- 
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The parameter s has to be greater than zero. The following limits exist 


lim F [a,b] = ab (8.32) 
s—04 
lim F.[a,b] = min(a,b) (8.33) 


Hence the operator interpolates between minimum fusion and product fusion. 


Averaging Fusion 


Averaging fusion schemes behave like a compromise and tend to suppress the effect of 
outliers. We will find this verified in our experimental studies for which we will use the 
following two averaging fusion schemes: 


C(0;|L1,-.,In) « c(og|Li, .., In-1)w + c(0%|In). (8.34) 
(505, L1,.+) Ln) x Cy; |0;, 4h, «, In—1)w + e( joi, In). (8.35) 
For w = 0 the newly obtained object and pose estimation determines the outcome 


completely while for w > 1 the scheme is more conservative and sticks rather to the older 
confidences than trusting the newer ones. 


Applying the above scheme we are weighting the contributions of each measurement 


differently. The following “sum rule” for information combination gives the true average 
of all contributions. 


n 


COs tay 505 bes) > S © c(oil Lm) (8.36) 
m=1 

Cp log diy dnd -? CP; Onl) (8.37) 
m=1 


It has recently been argued by Kittler et al. that this approach should be very robust 
against outliers [67]. Consequently, it will be tested during runs with increasing outlier 
rates. 


Disjunctive Fusion 


We will discuss below why disjunctive fusion fails to integrate results correctly in active 
object recognition. For the purpose of completeness and to back up our conclusion by 
experimental evidence, we will also include the following maximum fusion scheme in our 
experiments: 


COs daj05da) =~ max (c(6;|,24 Ina), C0; |La)) : (8.38) 
CCP s| Op dis Ty) max (ely op tigen stne a) ClO7F Onl) ys (8.39) 
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For reasons of comparison the other parts of the fuzzy active fusion algorithm follow 
closely the probabilistic version. We refrain from repeating the arguments here and only 
comment that Shannon entropy is a perfectly valid measure of non-specificity for sum- 
normal distributions of fuzzy confidence values. 


Besides the use of different fuzzy fusion operators we will also implement the following 
evidence theoretic approach. 


8.4 Evidence Theoretic Active Fusion 


Evidence theory has emerged from the works of Dempster and Shafer [120] on statistical 
inference and uncertain reasoning. It has already been used for active recognition [65]. 
Our implementation is genuine, pursuing closely the lines of reasoning already established 
for the probabilistic and possibilistic case. For a thorough introduction into evidence 
theory see [58]. We follow [65] in providing an extremely short statement of those basic 
facts that are relevant for our purpose. 


In evidence theory, the set of all possible outcomes in a random experiment is called 
the frame of discernment, often denoted by 2. The 2!@! subsets of Q are called proposi- 
tions, and the set of propositions is denoted by 2°. We establish the following frame of 
discernment: 


), — {01, 02, a Qs = {P1, Ya; ea = on x Qe (8.40) 
We also define So,.9, := {(0, p;)} and So, := {oi} x Oy. 


Contrary to the Bayesian formalism in which probability masses can be assigned only 
to singleton subsets (i.e.,elements) of 2, in evidence theory probability masses are assigned 
to propositions. When a source of evidence assigns probability masses to the propositions 
represented by subsets of 2, the resulting function is called a basic probability assignment 
(bpa) or a mass function. Formally, a bpa is a function m : 2° — [0,1] where 


O<m(X)<1 m@)=0 So m(x)=1 (8.41) 


Subsets of Q that are assigned nonzero probability mass are said to be focal elements of 
m. The mass m(Q) assigned to the whole frame of discernment reflects our uncertainty. 
Certain kinds of bpa’s are particularly suitable for the representation of actual evidence. 
Among these are the commonly used simple mass functions. A mass function is said to 
be simple if there is a non-empty subset A C 2 such that 


m(A) =c mQ)=1—c  m(elsewhere) = 0 (8.42) 
A belief function, Bel(X), over 2 is defined by 


Bel(X) = S° m(¥). (8.43) 


YCX 
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In other words, our belief in a proposition X is the sum of probability masses assigned to 
all the propositions which imply X (including X itself). There is a one-to-one correspon- 
dence between belief functions and mass functions. 


Two bpa’s, m, and mz which have the same domain but correspond to two independent 
sources of evidence, may be combined to yield a new bpa via 


m(X)=K SYS) mi(X1)m2(X), (8.44) 
where 
Kh =1— SO mi(X1)mo(X2). (8.45) 
X1NX2=0 


This formula is commonly called Dempster’s rule of combination or Dempster’s orthogonal 
sum. The factor K~! is a measure of conflict between the two sources. We will also use 
the notation 

m=m,OmMy (8.46) 


which is appropriate since the combination is associative and commutative. The orthog- 
onal sum of belief functions Bel = Bel, Belz can be obtained from eq.(8.46) exploiting 
the bijection between belief functions and mass functions. 


8.4.1 Evidence Theoretic Object and Pose Classification 


In evidence theory the initialization of the mass functions is the most important step. 
According to our minimalistic programme we will seek a mass assignment that departs 
only slightly from the already described approaches. By doing so, we can make only lim- 
ited use of various features of evidence theory. We will discuss possible alternatives below. 


Here we adopt a common strategy where only singleton subsets and (2 itself are focal 
elements. Since we assume that the object recognition task delivers only confidences for 
the singleton subsets S,,,3, we face the problem of how much mass we should assign to 2 
which reflects the overall remaining uncertainty. 


One way to initialize the quantities could be to account for the positive evidence for hy- 
pothesis (0;, ;) by setting it proportional to the output of the probabilistic classifier. The 
mass m(Q|g,,)° assigned to Q could reflect our lack of knowledge since very non-specific 
results of the object recognition tasks will induce a high mass m(Q|g,,): 


M(So,,6;|Sn) = a P(o;,$;|gn) 
m(Q|g,,) B NonSpecificity(P(o, ylgn)) 


One of the two parameters a, 3 could be eliminated using 


°Writing m(Q|g,) instead of m(Q) we enhance the notation of evidence theory to make the dependence 
on the input data explicit. 
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D2 MSo,.9518n) + M(Agn) = 1 (8.47) 
01,9; 
The other parameter quantifies the relative importance of the mass assigned to m(Q|g,,) 
when compared to the masses m(5o,.g,|8n). Adopting this scheme we still have one pa- 
rameter to fine-tune. 


There is another very common parameter-free way to initialize masses from confidences 
which are given only on singleton subsets. For this purpose we use simple mass functions 
Mo,,g, to represent the evidence delivered by the object and pose classification task when 
given the feature vector g. The mass function m,,,g, should encode the knowledge we 
have obtained from having observed the classification result for hypothesis (0;,();) only. 
Thus the positive evidence carried by the probability P(o;,;|gn) is assigned completely 
to Mo,,8;(9o;,¢;|Bn) While the remaining evidence 1 — P(0;,¢;|gn) is given to the frame 
of discernment. This expresses that we do not know where to assign the remaining 
mass 1 — P(0;,$;|gn) since we assume that we have observed the classification result for 
hypothesis (0;,(P;) only. Repeating this step for every object-pose hypothesis we establish 
No X Ny mass-functions each given by 


Mo,0; (So;,0;18n) = Plo; PjlGn)  —- Mo,,9,(QlEn) = 1 — P(oi, G;lgn) (8.48) 


One can regard these n, xn, mass functions as the shared knowledge of a pool of “experts” 
each of which knows only the mass for one single object-pose hypothesis. The knowledge 
of all experts can now be fused by Dempster’s rule of combination to obtain one integrated 
mass assignment. Thus the overall result of one classification step is defined to be the 
separable mass function equal to the orthogonal sum of the simple mass functions 


m(|8n) = ®D Mo..9; (18m) (8.49) 


Its focal elements are the singleton subsets So,,3, and the frame of discernment 22. Using 
definition (8.44) it is easy to show that 


P(0i, $j|Sn) 
1 — P(oi, j|Sn) 


m(So,.¢;|8n) M(Qgn) =: Odds(o;, Yj|n) M(Q\gn) (8-50) 


-1 


m(Qgn) = {1+ S° Odds(oi, gn) (8.51) 


O14, P5 


The singleton elements are assigned masses proportional to the odds for these hy- 
potheses while the whole frame of discernment ( is assigned a mass that increases as the 
sum of the odds decreases. If in particular only one hypothesis has P(0;,;|g,) = 1 we 
obtain m(Q|g,) = 0. Thus the obtained assignment reproduces the qualitative features 
that we tried to introduce more heuristically before. In contrast to that approach the 
presented scheme does not require fine-tuning of any free parameters. The composition 
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of a separable mass function from a pool of single mass functions is a well studied and 
often applied procedure to initialize mass functions from confidences that are given only 
for singletons [114, 120, 58}. 


In analogy to eq. (3.7) the masses for the physical pose are given by 


TUS ops Ign) = TUS sa gicnaje| By): (8.52) 
Finally we obtain the belief in a specific object hypothesis through 
Bel(So|8n) = >_, M(Soi.v;|8n)- (8.53) 
Qj 


Note that S,, and So,,, belong to the same frame of discernment 2. 


8.4.2 Fusion in Evidence Theory 
Dempster’s rule of combination gives 

M-|L1,.5fn) = m-|h,..,In1) 6 mL) (8.54) 
For the considered separable mass functions Dempster’s rule reads in detail 


mQh, .., Ln) 
TO Spo. Higagle) 


mL, -- Ln—1)M(Q Lp) (8.55) 


x m( 
oc mSorgo,|E1s «+1 In—1)(Soy,p,[In) + 
( 
( 


= 


heey Ti, ar In) m(QU,) + 

MOLL, -5 In—1)m(Go,,0;|£n) (8.56) 
Usually it can be expected that after some active fusion steps the overall uncertainty 
m(Q|-) becomes small. In this case the above fusion scheme closely resembles probabilistic 
product fusion as presented in eq.s (8.5)-(8.7). We can anticipate major differences in the 


early phases of the active fusion loop, which may have a long term effect for subsequent 
steps. 


8.4.3 View Planning with Evidence Theory 
The system based on evidence theory calculates the score used in eq. (3.14) with 
Sn(Ap) = S° m(So.gjl£is + In) AH(AWM0i, 03 Li, «+ In) (8.57) 
O15 


To find out how ambiguous the obtained object hypotheses are we use again Kaufmann 
entropy 


1 
H(olgi, my En) a a: » Bel(oj|gi, ay Zn) log Bel(o;|gi, ig En) “tr log D (8.58) 


with S := 5°, Bel(o;|g1,.., gn). Again one may imagine different measures of non-specificity 
as reviewed for example in [93, 94]. 
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8.5 Discussion 


Having defined the details of various approaches let us reflect upon common properties 
and differences of the discussed implementations of active object recognition modules. 


8.5.1 Object and Pose Classification 


Since in our experiments we learn the likelihoods p(g|o;, y;) from examples, the Bayesian 
classifier presented in section 8.1.1 is a very natural realization of an object, pose estima- 
tion task. For the other approaches we have addressed the problem of how to initialize 
these schemes from a probabilistic classifier, i.e. how to transform the probabilistic clas- 
sification result in a possibilistic or evidence theoretic framework. The issue of how to 
design such interfaces arises always if one starts to combine algorithms which are based 
upon different uncertainty calculi. In general there is no unique correct solution to this 
problem. Thus there could be other ways to initialize the various schemes than the ones 
we have presented. 


The proposed initializations in eq.s (8.15,8.48) of the possibilistic and evidence theo- 
retic frameworks have been chosen for the following reasons. 


e The proposed methods result in simulating classifiers that have similar confusion 
matrices. The results of the comparison are thus not biased because of different 
classifier performance. 


e It has already been mentioned in section 2.2 that the approximations in the defi- 
nitions (8.15,8.17) lead to the original recognition algorithm of Nayar and Murase 
given by eq. (2.9) [27]. Thus our initializations are imitating approximations in 
currently used systems. 


e The composition of a total mass function through the orthogonal sum of simple 
mass functions in eq.(8.49) is a commonly employed procedure in Dempster-Shafer 
theory of evidence [114, 120, 58]. In our experiments we can study the effect of this 
operation on the overall performance of an active recognition system. It must be 
kept in mind, however, that the chosen separable mass functions do not fully exploit 
the capabilities of evidence theory to handle uncertain information. 


Especially this latter comment on evidence theory deserves some further discussion be- 
cause it is the initialization part which is most important for the evidence theoretic model. 
In general we can assign masses to every subset of 2°, i.e. to 2”°*"* different sets. This 
number grows large very quickly and it is clearly inconceivable to assign masses to each 
possible compound hypothesis. Following common practice we have reduced the problem 
to assigning masses only to n, X ny, singleton hypotheses and to the whole frame of dis- 
cernment. This is of course a gross simplification and may even be an over-simplification 
in many cases. 
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It is interesting to note that Hutchinson and Kak [65] use a more sophisticated model 
which is still polynomial in time: at each classification step they assign positive masses 
only to disjoint subsets of the frame of discernment. These subsets, however, need not be 
singleton subsets as is the case in our formalism. Let us briefly discuss what modifications 
should be made to incorporate these ideas. We can assign masses in such a model by the 
following procedure. First cluster the object, pose hypotheses according to their confi- 
dences such that hypotheses with similar confidences belong to the same cluster. This 
is a one-dimensional crisp clustering problem (since the confidences are given by a single 
number) and can be solved very efficiently. Each cluster defines a subset A C 2 consisting 
of all the singleton hypotheses that obtain similar confidences. We propose to assign to 
each subset a mass that is equal (or proportional) to the sum of the confidences of the 
singleton hypotheses in the cluster. This initialization ensures that hypotheses with sim- 
ilar confidences are grouped together into a single subset that represents the compound 
hypothesis. For example, in case all confidences are similar we get m(Q|g) = 1, in case 
only one singleton hypothesis (0;,y;) is favored we have m(So,,9,/8) = c(oi, y;|g) and 
m(Q \ Soi.0) lg) =1-e(o, ;|g) etc. 


In this scheme fusion is still performed by Dempster’s orthogonal sum, eq. (8.54), 
which of course no longer translates into eq. (8.55). Since the coverings of 2 obtained 
from different viewpoints w~, always have non-zero intersections Dempster’s rule will al- 
ways give well defined results. View-planning will still be based upon the confidences 
for the singleton hypotheses. This is the case because for estimating the utility of a 
specific action Aw it is essential to assume temporarily that some singleton hypothesis 
(0;,;) is correct. To that end Hutchinson and Kak have established quantities of the 
type C(o;,9;|g) := m(Alg)/|A] where A C 2 is the unique subset of ( that contains the 
hypothesis S,,,,, € A. Instead of giving each hypothesis the same weight |A| we can do 
better in our setting and simply use the original confidences c(0;, y;|g) for view-planning. 
In our experiments the confidences c(o;, y;|g) sum to one and we may use Shannon en- 
tropy to evaluate non-specificity. 


The main difference of this more sophisticated evidence theoretic implementation to 
our approach lies in the initialization and fusion stages. Because of the importance of 
these steps it is likely that the sketched approach would produce better results than the 
minimalistic extension of the general active fusion algorithm we are using. However, 
since our main interest focuses on the unifying model presented in chapter 3 only the 
implementation discussed in section 8.4 will be tested experimentally. 


8.5.2 Fusion 
Conjunctive, Averaging and Disjunctive Fusion 


The fusion schemes presented for the probabilistic, the possibilistic and the evidence the- 
oretic approach all display conjunctive behavior in the sense of [24]. Thus the worst 
classification result dominates the outcome of classifier fusion. This is indeed a wanted 
behavior for active systems: In order to disambiguate very similar objects the system 
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has to move to an appropriate viewpoint where the classification result for the right ob- 
ject will (usually) still be high while the wrong object will (usually) receive its (perhaps 
only) small classification result. An averaging fusion scheme tends to lower the value of 
this decisive observation while a disjunctive scheme just misses its importance. We will 
demonstrate this experimentally in chapter 9 with the fuzzy fusion operators introduced 
in section 8.3. 


On the other hand, conjunctive fusion implies that positive evidence is not counted 
that strongly. In particular the effect of multiple positive observations in favor of the 
correct hypothesis can relatively quickly be destroyed by one single outlier. The burden 
lies on subsequent actions which need to “recover” from such erroneous interpretations. 
In the Bayesian case these considerations have been confirmed by a recently performed 
comparative analysis of the error-tolerance of probabilistic product fusion and fusion by 
averaging. Theoretical and experimental evidence has clearly favored the average opera- 
tion in [67]. We will be able to confirm the robustness of averaging fusion schemes against 
outliers. Consequently, we will arrive at the conclusion that object recognition can indeed 
be made very robust by relying on multiple observations and using the right fusion opera- 
tor. But at the same time the active planning part (the core part according to the active 
fusion paradigm) will face serious difficulties in environments for which the outcome of 
the next step cannot be predicted with a minimum degree of certainty. We will come back 
to this issue in the concluding discussion after having presented the experimental evidence. 


In any case, the fact remains, that the choice of the right fusion operator depends 
strongly on the characteristics of the specific classifiers. Object classification using para- 
metric eigenspaces produces results that are amenable to active approaches (which prefer 
conjunctive fusion to be most effective). Allowing for multiple observations the perfor- 
mance of individual classifiers can obviously be lower than in static systems but still 
cannot be lowered below the point where conjunctive fusion fails to increase recognition 
performance. This will happen if in at least one of the multiple observations the cor- 
rect hypothesis is erroneously associated with a confidence value which is lower than the 
minimum value obtained for one of the wrong hypotheses. 


Associative and Commutative Fusion 


Probabilistic product fusion, possibilistic minimum fusion and Dempster’s rule of combi- 
nation are all associative and commutative. Because of associativity it does not matter 
whether we fuse only the latest single observation with the already fused result from pre- 
vious observations or whether we fuse all individual observations at each step. Due to 
commutativity it does not matter in what order the observations are made. Thus the fact 
that later observations are planned according to the results obtained at previous obser- 
vations is not reflected in the presented fusion schemes. The used aggregation operators 
do not capture the temporal aspects of active fusion. 


Dempster’s rule is always associative and commutative. In probability theory, both 
associativity and commutativity are consequences of the assumption of conditional inde- 
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pendence which is commonly applied in Bayesian classifier fusion because of the lack of 
knowledge of the relevant conditional probabilities [67]. In possibility theory (or fuzzy 
logic), various non-associative and/or non-commutative aggregation operators exist [48]. 
But again we face the problem of how to choose an operator that best encodes the implied 
planning phase. 


The given fusion algorithms behave as if we had a static multi-camera setup where 
none of the multiple observations has precedence over the others. Because of this, the 
presented active fusion algorithms are likely to be less effective than they could be if more 
sophisticated aggregation operators were used. However, we have not found any examples 
in which the fusion operators would completely fail to integrate results correctly because 
of their associativity and commutativity. 


The Spatial Constraints 


Observations in active object recognition are not only subject to temporal constraints but 
also to spatial constraints. Again the presented fusion schemes do not obey any spatial 
constraints. While it appears that disregarding the temporal constraints only leads to 
less effective fusion schemes, not considering the spatial consistency of observations can 
actually cause a complete failure of the algorithm in some special cases. 


Let us discuss the issue by giving one example. If the object data-base contains two 
identical objects that appear symmetric under rotation of e.g. 180° (for example two 
identical blocks) and one of the objects carries a marker on one side then the manifolds 
in feature space look similar to the manifolds for the two hypothetical objects og and 09 
depicted in Fig. 8.la. In this case fusing confidences according to eqs. (8.5),(8.20) or 
(8.54) will fail to integrate results correctly when trying to recognize the object without 
the marker (og in our hypothetical example). This can be understood easily if one imag- 
ines a static system with an arbitrary number of cameras placed all over the view-sphere 
observing the object without the marker. Each separate observation will produce equal 
confidences for both considered objects because each single view may stem from either of 
the two objects. But the whole set of observations is only possible for the object without 
the marker because no marker can be found even though images from opposite views have 
been taken. However, if fusion is based upon any of the eqs. (8.5),(8.20) or (8.54) then 
this fact will not be accounted for. Instead, even after fusing all single results both object 
hypotheses will achieve equally high confidence values. 


The point we want to make is that the considered fusion schemes do not take into 
account all the constraints that exist between different observations. In particular we 
assume that the angular displacement between different camera positions does not influ- 
ence the outcome of the fusion process. Hence, the fact is missed that it is absolutely 
impossible to turn object 09 through 180° without encountering the marker. 


Concerning probabilistic fusion, the naive Bayesian fusion operator (without terms 
for spatial consistency) has been applied widely by different authors working on active 
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A 


(a) (b) 


Figure 8.1: Figures (a) and (b) depict manifolds in feature space in case all possible observations 
for the hypothetical object og could as well stem from another object o9. The 
depicted case arises for example for two identical blocks which are symmetric under 
rotations of 180°. One of the blocks (09 in the above figure) carries a marker on 
one side. Figure (a) depicts the case of a practically complete overlap of possible 
observations for object og with object 09. Object og has to be symmetric to produce 
a manifold where each point corresponds to two (or more) views of the object. 
Object og is no longer symmetric because of the marker. Therefore each point on 
its manifold corresponds to a single view. Figure (b) illustrates the case in which og 
is not fully symmetric, i.e. feature vectors for different views are not equal but only 
very similar. The effect described in the text can manifest itself to some degree also 
in this case. Such a situation can occur for example if the chosen feature space is 
not appropriate for resolving finer details of different views even though the views 
are distinct. 


recognition tasks [35, 97, 28, 124] since it allows for efficient information integration in the 
probabilistic case. Still, the above example shows that in some cases disregarding spatial 
information will cause the system to fail. The necessary conditions for this to happen 
may seem to be artificial. All conceivable features for one object must also be possible 
for another object. However, it should not be overlooked that what really counts is not 
the actual visual appearance of the objects but rather the internal representation which 
may be quite similar for objects of different visual appearance (see also Fig. 8.1b and 
section 9.7). While we will stick to the above fusion schemes in this work (knowing that 
in our model-database the necessary conditions for failure of the fusion algorithm do not 
arise) we refer the interested reader to the forthcoming thesis of M. Prantl [106] which 
will contain a probabilistic fusion scheme that pays attention to the spatial constraints 
between different observations. 


8.5.3. View Planning 


The view-planning stage depends critically on the chosen measures of non-specificity, 
imprecision and cost and on the fusion scheme that is used in eq. (3.20) to determine the 
utility of an action. Whenever using combined measures of utility we have applied the 
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following fusion scheme during the calculation of utility: 
U(Ad|o;, 93, 4, .-, In) = low(C)(wH AF + w AT) (8.59) 


with wy +w; = 1. Of course, the above scheme is just one possible mathematical ex- 
pression for the linguistic statement “An action is very useful if it is associated with low 
costs and big reductions in either non-specificity or numerical imprecision (or both).” A 
shortcoming of eq. (8.59) may be that the factors wy and wy, are constant in time. It 
will usually be advantageous to try to recognize the object first and only then to start 
to take into account more seriously the imprecision in pose. Thus w; could actually be 
made a decreasing function of the non-specificity H of the object hypotheses. However, 
we have found no indication for the necessity of such refinements in our experiments. 
One reason may be that as non-specificity H gets small (i.e. the object has been recog- 
nized) AH gets usually small too and the above sum is automatically dominated by w;ATI. 


Another point to be mentioned is the possible range of AH and AJ. As AH and AI 
are directly related to the measures of non-specificity H and numerical imprecision I (see 
eqs. (3.16),(3.18)) their range is determined by the range of H and J. When using the 
above scheme (8.59) it is a good idea to demand H, J € (0, 1] since otherwise the weighted 
average may be dominated artificially by the quantity which is measured in smaller units. 
It is thus necessary to know the maximum values of H and IJ to re-scale the quantities 
appropriately. 


The issue of what cost to assign to a particular action will not be investigated further 
in our experiments (low(C) = 1). The cost of an action can be taken into account 
by disfavoring big angular displacements of the camera. The cost term becomes much 
more important in cases where the actuator (for example a robot) does not know its 
current position with certainty or whenever certain translations are forbidden (because 
of walls etc.). In order to avoid dangerous motions that may lead to collisions the cost 
of the corresponding actions should be extremely high compared to the cost of relatively 
save motions. Since the utility of very costly actions can be expected to be very low such 
actions may also be disregarded right from the beginning through a reduction of the space 
of possible actions according to the current hardware coordinates (or their estimates). We 
will not need a cost term nor a reduced space of possible actions because all conceivable 
angular displacements are allowed and can easily be obtained in our experiments. 


Chapter 9 


Experiments 


In this chapter we will present the obtained experimental results for the active object 
recognition approaches discussed in chapter 8. We begin by stating the motivation for 
the various experiments that have been carried out. After describing the experimental 
setup and the used database of object-models we will present an exemplary run of the 
probabilistic implementation, followed by a report of the results of extensive test runs 
exploring various aspects of our active fusion algorithm. 


9.1 What Sort of Experiments and Why ? 


Since we are aware of the fact that is usually difficult to draw general conclusions from ex- 
perimental results (see also the discussion at the beginning of chapter 8) we have designed 
our experiments such that we can provide evidence for relatively general statements about 
all active fusion algorithms that follow the design from chapter 3. 


e We have seen in chapter 8 that our active fusion algorithm is general enough to allow 
for implementational realizations based on rather different uncertainty calculi. What 
is still missing is 


— an experimental prove that these implementations are indeed successfully solv- 
ing the considered active fusion problem, and 


— a study of the effectiveness of different implementations in a concrete environ- 
ment. 


Both issues will be addressed in section 9.4 where we will compare the effectiveness of 
the probabilistic approach, the possibilistic implementation, the evidence theoretic 
scheme and the fuzzy fusion operators. 


e We also seek for an experimental confirmation of the general conclusions regarding 
different types of fusion operators stated in section 8.5. While the tests reported in 
section 9.4 will provide evidence that conjunctive schemes are appropriate for active 
fusion we will show in section 9.5 the failure of disjunctive schemes and discuss 
the reason for that failure. Finally, the behavior of two averaging schemes will 
be studied in section 9.6. Both will perform well in case no outliers are present, 
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Figure 9.1: A sketch plus a picture of the experimental setup with 6 degrees of freedom and 15 
different illumination situations. 


though being less effective than conjunctive schemes. In a detailed study we will 
proceed to show the superiority of averaging schemes in case the failure rate of the 
recognition algorithm increases. In addition, this experiment will allow us to draw 
important general conclusions regarding the behavior of active fusion in non-robust 
environments. 


e To conclude the chapter, we will study the effect of using different measures of non- 
specificity and numerical imprecision through a few examples. Especially regarding 
numerical imprecision we will provide experimental evidence for the assertion that 
using standard deviation can be quite counter-productive in active fusion algorithms 
if uncertainty is modeled by sum-normal distributions of confidence values (such as 
e.g. probabilities). 


9.2 The Experimental Environment 


Figure 9.1 gives a sketch of the experimental setup. The hardware is placed in a room 
of approximately 4m x 5m x 2.3m ( width x length x height). A rectangular frame 
is mounted to one side-wall. The frame carries a platform with a camera. The camera 
observes objects which are placed on top of a rotating table of approximately 1m in di- 
ameter. The system allows for the following six degrees of freedom: table rotation (360°), 
horizontal translation (2.5m), vertical translation (1.5m), pan (50°), tilt (420°), zoom 
(6-48mm). In addition, there are 4 independent light sources mounted on a rectangular 
frame, yielding 15 different illumination conditions. 


In the experiments described below two lights are switched on and the system per- 
forms movements of the turn-table supporting the rigid object and captures images. For 
convenience the camera is held fixed and the turn-table is used to arrive at different view- 
ing positions. Similarly the xz-Table can be used to control the camera position while the 
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Figure 9.2: A raw image as captured by the camera. 


object is kept fixed. The general mathematical formulation remains the same for both 
choices?. 


9.2.1 The Main Database of Object Models 


In order to compare the effectiveness of different fusion and planning schemes we will test 
the recognition systems on 8 objects of similar appearance concerning shape, reflectance 
and color (Fig. 9.4). Two objects 07 and og can only be discriminated by a white marker 
which is attached to the rear side of object og (Fig. 9.6). During the learning phase the 
8 cars are rotated on a computer-controlled turn-table by 5° intervals, with constant il- 
lumination and fixed distance to the camera. Fig. 9.2 shows one of the raw images while 
Fig. 9.5 displays a representative sub-set of the views obtained for object o7 after prepro- 
cessing. 


During preprocessing the object region is automatically segmented from the back- 
ground using a combined brightness and gradient threshold operator. Pixels classified 
as background are set to zero gray level. The images are then geometrically rescaled to 
100 x 100 pixels and converted to image vectors y(/) of unit length. 


The set of all image vectors for the 8 cars is then used to construct eigenspaces of 
3,5 and 10 dimensions. These are the only feature spaces we will be using. We will 
not bother about reconstructing the eigenspace when enhancing the data-base of object- 
models during the discussion on measures of numerical imprecision. See section 2.2.1 for 
the explanation why this is a perfectly valid way to proceed. For our purposes sticking to 
the established eigenspace has the additional advantage that it allows for a more thorough 
test of the considered active fusion algorithm. The feature space is no longer adapted to 
the model-database if the latter gets enlarged without reconstructing the eigenspace. The 
situation is thus similar to the use of feature spaces based upon other global image de- 
scriptors like moments. The conclusions we will be able to draw from our experiments 


‘However, the two approaches are not fully equivalent due to shadowing effects. In particular learning 
data that has been acquired by rotating the turn-table while the camera and the lighting is kept fixed 
may not be representative for test-runs during which the turn-table and the lights are held fixed while 
the camera is moved. We rotate the turn table keeping the lighting constant and the camera at 0°. 
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remain unaffected by the kind of feature space used. 


We will most often apply the three dimensional eigen-space in subsequent experiments. 
Only during the discussion of measures of numerical imprecision we will also use the higher 
dimensional eigen-spaces in order to demonstrate the difficulty of capturing textural de- 
tails with low-dimensional feature-spaces and how this affects the pose estimation. Here 
we just add, that the unusually low dimensionality (3 dimensions in most of the remainder 
of this chapter) turns out to be sufficient for an active system to recognize the considered 
objects. We will later comment on potential uses of the possibility to keep the feature 
space low dimensional. 


After having established the feature space we need to estimate the likelihoods of each 
view. To this end additional samples are collected for each view of the eight objects, emu- 
lating possible segmentation errors. The object region in the normalized image is shifted 
into a randomly selected direction by 3% of the image dimension, as proposed in [87]. 
The corresponding feature values are then extracted for these images (through projection 
into the eigenspace). 


The obtained sample distribution in eigenspace, drawn for a single object, is depicted 
in Figure 9.3a. Fig. 9.3b visualizes the overall ambiguity in the representation showing the 
significant overlap of the manifolds of the eight cars. The manifolds have been computed 
by interpolation between the means of the pose distributions (circles). Fig. 9.3c shows 
parts of the manifolds of o7 and og. The two manifolds coincide for most of the samples 
but for pose values around 180° the marker on og becomes visible and the manifolds are 
separated. 


The likelihood of a sample g, given specific object 0; and pose yj, is modeled by a 
unimodal, multi-variate, normal distribution 


1 1 - 
)F/2133,, , [1/2 ela {-5(8 — Hd) Dons (g - Mo.oi)hs (9.1) 


J 


p(gloi, pj) a (27 


where mean /t,,.,, and covariance Y¥o,,., are estimated from the data that has been cor- 
rupted by segmentation errors. 
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(a) Sample distribution (b) Manifolds 
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(c) Marker distance 


Figure 9.3: (a) Eigenspace representation of the image set of object 0;. The three most promi- 
nent dimensions are shown. The interpolated manifold (line) and samples (dots, 
emulating segmentation errors) are drawn for poses of 5° intervals. Fig. (b) depicts 
a plot of the manifolds of all 8 objects (circles denote means of pose-specific sam- 
ples). Fig. (c) illustrates the distance between the manifolds of two similar objects 
that is introduced by a discriminative marker. 
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Figure 9.4: The eight cars used for the experiments. The object region is segmented from the 
background and the image is normalized in scale. Note that objects 07 and og appear 
identical from the chosen viewpoint. 


9.2. THE EXPERIMENTAL ENVIRONMENT 117 


Figure 9.5: A sub-set of poses for object 7. The pose is shown varied by rotation of 30° intervals 
about a single axis under constant illumination. 


Figure 9.6: The rear views of objects 7 and 8. The marker attached to the rear side of object 
og is the only feature discriminating it from object 07. 
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9.2.2 The Extended Database of Object Models 


The set of eight cars provides a good test-bed to compare recognition rate because the 
objects are similar both in reflectance and shape properties. However, this set of objects 
does not cover all interesting cases that can arise for the pose estimation module. In 
particular we have not yet considered objects with rotational symmetries (like symmetric 
blocks world objects) or objects that look nearly identical for an entire range of view- 
points (like texture-less cups). Homogeneous objects with some textures on one of their 
sides can also be of interest because they can be used to study the effect of increasing 
representational power with increasing dimensionality of the feature space. Using fea- 
ture spaces with lower dimensionality the texture may not be represented in every detail 
and the object may be represented similarly for a large range of view-points. As the 
dimensionality of the feature space increases more details can be captured and the pose 
estimation becomes more precise?. 


For these reasons we have augmented the data base of objects in the experiments 
performed to test measures of numerical imprecision. Figure 9.7 depicts six objects from 
the Columbia Object Image Libraries COIL-100 and COIL-20 [91]. These libraries con- 
sist of a database of color images of 100 objects (COIL-100) and gray-scale images of 20 
objects (COIL-20). The objects were placed on a motorized turntable against a black 
background. The turntable was rotated through 360 degrees to vary object pose with 
respect to a fixed color camera. Images of the objects were taken at pose intervals of 5 
degrees. Since these images were taken under very similar conditions as our initial set of 
car images they provide a useful pool of possible object views for simulated recognition 
runs. 


We have first transformed the images to rescaled, gray-value images. The rescaled 
images have been projected into the eigenspace which has previously been established 
with the images of the eight cars. The effects of inaccurate hardware positioning and 
segmentation errors have been simulated by projecting shifted images. Samples of the 
established probability densities have been drawn off-line for use during planning (see 
eq.(3.17)). Effects of noise and disturbances during real runs have been simulated by 
selecting one random sample of image features at each active step to represent the “cap- 
tured” image. 


Figures 9.7-9.11 depict representative views of the objects. The set of objects has been 
chosen such that it comprises interesting “limiting cases” for pose estimation. Fig. 9.7 
depicts a Japanese cup. The interesting feature of this object is that certain views 
(= 110°..240°) appear nearly or completely indistinguishable while other views contain 
texture (the Japanese writing at ~ 250°..100°) and will appear similar only for low- 
dimensional feature spaces. Object 019 and 0,; do not only appear similar from certain 


Especially the eigenspace representation tends to become particularly insensitive to high frequency 
textures if the number of eigenvectors is lowered. 

3See section 2.2.1 for a discussion on why this is not only a perfectly valid but for our purposes even 
particularly appropriate way to proceed. 
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(a) 09 (b) 10 (c) o11 (d) o12 (e) 013 (f) O14 


Figure 9.7: Six objects from the Columbia object image library. The images have been trans- 
formed to gray-scale and normalized in size before projection into the eigenspace 
erected with the images of the eight cars. 
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Figure 9.8: A sub-set of poses for object og. The pose is shown varied by rotation of 50° intervals 
about a single axis under constant illumination. 
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Figure 9.9: A sub-set of poses for object 01,. The pose is shown varied by rotation of 50° 
intervals about a single axis under constant illumination. 
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(e) 200° (f) 250° (g) 300° 


Figure 9.10: A sub-set of poses for object 013. The pose is shown varied by rotation of 50° 
intervals about a single axis under constant illumination. 
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Figure 9.11: A sub-set of poses for object 0,4. The pose is shown varied by rotation of 50° 
intervals about a single axis under constant illumination. 


9.3. AN EXEMPLARY SEQUENCE OF ACTIONS VAL 


view-points but do so systematically: 019 is symmetric under every rotation around the 
z-axis, while object 01; is symmetric under rotations of 180°. These symmetry properties 
make pose estimation either impossible or ambiguous and it will be very instructive to 
study the behavior of measures of numerical imprecision in such cases. Object oj2 has 
been included because the shape of the object appears slightly symmetric under rotations 
of 180° in low-dimensional feature spaces. Even though the individual pose estimations 
will not display a single peak when using only a three dimensional eigenspace the system 
will be able to obtain the correct pose through fusing multiple observations. Hence we 
will be able to demonstrate the necessity of fusing pose estimations, something which 
has been omitted quite often by other researchers, including [124]. Furthermore, these 
examples will help us to show that studying the effects of symmetry is of interest not only 
for “constructed” examples (like symmetric blocks world objects) but instead is relevant 
for real world objects as soon as the feature space is not able to represent all the necessary 
details. 


9.3. An Exemplary Sequence of Actions 


We begin our description of experiments with the discussion of an exemplary sequence of 
actions for the probabilistic approach. Doing so we intend to facilitate the interpretation 
of the following plots which summarize the results of a multitude of complete runs. Unless 
otherwise stated we are using a 3 dimensional eigenspace and the model database consists 
of the 8 cars 0}..0g. 


wig = 0° wy = 290° we = 125° w3 = 170° 

P(oilgo) | Pr || Ploilgi) | Pr || Ploilge) | Pr || Ploilgs) | Pr 
0.001 0.001 0.000 | 0.000 0.139 | 0.000 0.000 | 0.000 
0.026 0.026 0.000 | 0.000 0.000 | 0.000 0.000 | 0.000 
0.314 0.314 0.097 | 0.203 0.055 | 0.074 0.091 | 0.013 
0.027 0.027 0.096 | 0.017 0.097 | 0.011 0.002 | 0.000 
0.000 0.000 0.098 | 0.000 0.335 | 0.000 0.032 | 0.000 
0.307 0.307 0.015 | 0.031 0.009 | 0.001 0.224 | 0.000 
0.171 0.171 0.354 | 0.403 0.224 | 0.597 0.822 | 0.967 
0.153 0.153 0.338 | 0.344 0.139 | 0.315 0.032 | 0.019 


co| Sa] D] otf A] eo) DO] HA] S 


Table 9.1: Probabilities for object hypotheses in an exemplary run. See also Fig. 9.12a. Pr are 
the fused probabilities P(o;|g1,..,2,). Object 07 is the object under investigation. 


Table 9.1 depicts the probabilities for the object hypotheses in a selected run that 
finishes after three steps obtaining an entropy of 0.17 (threshold 0.2) and the correct 
object and pose estimations. Figure 9.12a displays the captured images. Object o7 has 
been placed on the turn-table at pose 0°. Note that the run demonstrates a hard test 
for the proposed method. The initial conditions have been chosen such that the first im- 
age - when projected into the three dimensional eigenspace - does not deliver the correct 
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Figure 9.12: (a) Sample pose sequence actuated by the planning system (see Table 9.1). A 
comparison of the number of necessary active steps (b) using a random (top) and 
the presented look-ahead policy (below) illustrates the improved performance. 


hypothesis. Consequently, object recognition relying on a single image would favor erro- 
neously object 03 at pose y = 0° (pose estimations are not included in table 9.1). Only 
additional images can clarify the situation. The next action places the system to position 
290° and the initial probability for object 03 is lowered. Objects 07 and og are now the 
favored candidates but it still takes one more action to eliminate object o3 from the list 
of possible candidates. In the final step the system tries to disambiguate only between 
objects 07 and og. Thus the object is looked at from the rear where they differ the most. 


The results of longer test runs are depicted in Fig. 9.12b where the number of necessary 
active steps to reach a certain entropy threshold are depicted for both a random strategy 
and the presented look-ahead policy. The obtained improvements in performance will also 
be confirmed in more detail in the following experiments. 


9.4 Conjunctive Active Fusion 


In this section we review the recognition rates obtained for the considered conjunctive 
implementations of the active fusion approach, i.e. the probabilistic, the possibilistic and 
the evidence theoretic approaches plus the conjunctive fuzzy fusion schemes. As a side 
effect to the set of test runs during which the effectiveness of the different methodologies 
will be compared we obtain the result that all presented implementations relying on con- 
junctive fusion operators solve the active fusion problem satisfactorily. 


One complete run consists of a pre-specified number of steps (30). The average object 
recognition rate is recorded at each step. Object recognition performance is measured 
assuming a crisp decision in favor of the object obtaining highest confidence’. 


During planning only recognition performance is optimized. No attention is paid to 
increase the precision of the pose estimation. Nevertheless the pose estimation usually 
becomes as precise as possible (up to +5°..10°) with the discretized data. 


4In the probabilistic case this amounts to the maximum a posteriori decision rule. 
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Probabilistic, Possibilistic and Evidence Theoretic Active Fusion 


Two test cases have been considered for each method: without view-planning (random 
actions) and with active view planning switched on in complete active fusion cycles. Every 
method has been tested starting with 30 different sample images for each of the 8 car ob- 
jects and 72 possible initial poses, resulting in 2 x 17.280 test runs that have been traced 
over 30 steps. The runs with random strategy have been performed for two purposes: 
First to compare only the effect of the fusion modules and second to obtain a ground level 
against which the effectiveness of action planning can be measured. 


Figure 9.13 depicts the results of the test runs which have been made to compare the 
probabilistic, the possibilistic and the evidence theoretic approach. At step zero (first im- 
age captured) all three methods display the same recognition performance of 52% on the 
complete data-base of objects, and 37% if only the similar objects 07 and og are used. A 
non-active approach would stop at this level. Subsequent observations lead to distinctive 
increases in recognition performance. This is true no matter whether the next action is 
planned or not. However, if the actions are planned the system reaches its final recogni- 
tion rate much faster than in the case the next viewing position is chosen at random. The 
differences in recognition rate are already considerable if the results are averaged over the 
whole database of objects but get even more impressive for objects 07 and og alone. In 
that case the best viewing position (y * 180°) is chosen at a very early stage only if the 
system plans the next action. With a random movement the system is likely to end again 
in a position from which objects o7 and og cannot be disambiguated. Hence, we observe 
maximum differences of 15% for the probabilistic approach (Fig. 9.13d), 16% for the pos- 
sibilistic implementation (Fig. 9.13e) and 6% with the evidence theoretic formulation. 


All methods perform better with view-planning both in recognition rate and in recog- 
nition speed. The most striking result is the enormous increase in recognition performance 
if action planning gets switched on in the possibilistic case (Figs. 9.13b and e). This is a 
very nice demonstration that the whole system (fusion plus planning) is more than the 
sum of its parts. Adding a planning phase improves also the performance of the Dempster 
Shafer approach. But compared to the possibilistic version this happens only to a much 
smaller extent. 


In order to facilitate comparing the recognition rate of different implementations the 
difference in recognition rate is depicted in Fig. 9.14 for a variety of configurations. In all 
the considered examples the probabilistic approach turns out to be the most successful 
one. The possibilistic implementation comes close to the recognition performance of the 
probabilistic version if action planning is switched on (Fig. 9.14g). But even though the 
same recognition rate can be attained in the end the recognition speed of the probabilistic 
approach is unsurpassed. The effect is more pronounced if only objects o7 and og are 
considered. A graphic example is given in Fig. 9.14k where the difference in recognition 
rate between the probabilistic and evidence theoretic approach reaches a maximum around 
17% at step four. 
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Figure 9.13: Average recognition rate (0%..100%) over active fusion step (0..30). The figures 
contain three plots with superimposed standard deviation error-bars: The average 
recognition rate for runs with action planning switched on (upper plot), for runs 
relying on a random strategy (middle plot) and the difference of the two recognition 
rates (lower plot). Figure (a) depicts the achieved recognition rates on the complete 
model database for the probabilistic approach. Figures (b) and (c) show the results 
for the possibilistic and evidence theoretic approach respectively. Figures (d)-(f) 
depict the achieved recognition rates when using only the similar objects o7 and 
0g. 


9.4. CONJUNCTIVE ACTIVE FUSION 125 


Randomly Chosen Actions. 


{Ly 
itengerssspiet 


(a) Prob-Poss 0..0,  (b) Prob-DS 0;..0g — (c) Poss-DS 0}..03 


pyres x 
rir 


Tapssir,, 


(d) Prob-Poss 07,0g (e) Prob-DS 07,03 — (f) Poss-DS 07, 0g 


Planned Actions. 


~S 
we 
aie 


(g) Prob-Poss 0;..0g (hb) Prob-DS 0;..03 (i) Poss-DS 0}..08 


i Ty 
Nee 


(j) Prob-Poss 07,03 = (k) Prob-DS 07,0g (1) Poss-DS 07, og 


Figure 9.14: Difference in recognition rate versus active recognition steps for various approaches. The 
figures in row (A) depict the difference in recognition performance when the system ran- 
domly chooses the next viewing position. (a) Compares the probabilistic and possibilistic, 
(b) the probabilistic and Dempster-Shafer and (c) the possibilistic and Dempster-Shafer 
approach. Initially all three methods have the same performance and consequently the 
difference is 0%. The figures in row (B) display the differences in recognition rate when 
using a random strategy to recognize o7 and og. Action planning has been switched on for 
the results depicted in rows (C) and (D). From left to right, the figures show the difference 
in recognition rate between the probabilistic and possibilistic, probabilistic and Dempster- 
Shafer and possibilistic and Dempster-Shafer approach. Row (D) serves to compare the 
recognition rates for o7 and og for different implementations. 
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Conjunctive Fuzzy Fusion 


Motivated by the success of the probabilistic scheme we have studied small variations of 
that approach. In particular, the considered probabilistic product fusion operator has 
been replaced by the two fuzzy aggregation operators 


ab 


Pela, Oa) and F.[a, 6] = (a5 +b? — asbeyt/8 


Both operators generalize product fusion which is obtained for F,, with w = 1 and for 
F, with s = 0. See section 8.3.2 for further comments on these two families of fusion 
operators. 


Let us start with the product of powers approach F;,[a,6]. The fusion operator de- 
pends on a parameter w. We have performed extensive test runs for different values of 
that parameter. The results for the product of powers approach are depicted in figures 
9.15 and 9.16. The three figures in the top row depict the surface of recognition rate 
(0%..100%) over the number of observations (0..30) and the considered range of the pa- 
rameter w (0..4). For visualization purposes the same surface is depicted from different 
view-points. The plots below correspond to slices cut in the surface of the top row. These 
plots depict recognition rate at a specific number of steps over the value of the exponent 
w. The measured data points and their error-bars are included in the plots. 


For w = 0 the newly obtained confidences are equal to the fused confidences and thus 
there is no fusion step involved at all. Consequently the recognition rate stays constant 
over time. As w approaches 1 the maximum recognition rate of the naive probabilistic 
product fusion approach is reached. Performance rates are lower for all other values of w. 
The same effect can be observed in Fig. 9.16 when only objects 07 and og are placed on the 
turn-table. Product fusion performs again best, the effect being even more pronounced 
in this case. 


Similar results have been obtained with the operator F’, suggested by Schweizer and 
Sklar [119]. This operator interpolates between minimum fusion and product fusion The 
results are depicted in figures 9.17 and 9.18. Again the three figures in the top row depict 
the surface of recognition rate (0%..100%) over the number of observations (0..30) and the 
considered range of the parameter s (0..4) while the subsequent sub-figures depict slices of 
recognition performance vs. s. In these figures product fusion is obtained as s — 0 while 
minimum fusion corresponds to s — oo. Again we can see that product fusion performs 
best. 


Summing up, we have found that basing our active fusion approach upon product 
fusion gives the best results. The probabilistic implementations outperforms also repre- 
sentative families of parameterized, conjunctive fuzzy fusion schemes besides possibilistic 
and evidence theoretic fusion when applied to our model-database. 
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Figure 9.15: Planned Runs, Objects 0j,..,0g, 30 Steps, Fusion Ff, = ab. Results of 
planned experiments with the product of powers approach, eq.s (8.28,8.29). The 
figures in the top row depict the surface of recognition rate (0%..100%) over number 
of steps (0..30) and exponent w (0..4). For visualization purposes the same surface 
is depicted from different view-points. The plots below correspond to slices cut in 
the surface of the top row. These plots depict recognition rate at a specific number 
of steps over exponent w. For example, figure (d) shows the recognition rate after 
the 2nd step for different algorithms indexed by exponent w. The measured data 
points and their error-bars are included in the plots. All objects have been placed 
on the turn-table, starting with every possible initial pose. 30 test runs have been 
performed for every possible initial condition. 
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Figure 9.16: Planned Runs, Objects 07, 0g, 30 Steps, Fusion F,, = ab. Results of planned 


experiments with the product of powers approach, (8.28,8.29). The figures in the 
top row depict the surface of recognition rate (0%..100%) over number of steps 
(0..30) and exponent w (0..4). For visualization purposes the same surface is de- 
picted from different view-points. The plots below correspond to slices cut in the 
surface of the top row. These plots depict recognition rate at a specific number 
of steps over exponent w. For example, figure (d) shows the recognition rate after 
the 2nd step for different algorithms indexed by exponent w. Exponent w has been 
varied from 0..4. The measured data points and their error-bars are included in 
the plots. Only objects 7 and 8 have been placed on the turn-table at each of 
their initial poses and 30 test runs have been performed for every possible initial 
condition. 
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Figure 9.17: Planned Runs, Objects 01, ..,0g3, 30 Steps, Fusion F, = eee Results 
of experiments with a conjunctive fusion operator as suggested by Schweizer and 
Sklar, eq.s (8.30,8.31) . The figures in the top row depict the surface of recognition 
rate over number of steps and exponent s. For visualization purposes the same 
surface is depicted from different view-points. The plots below correspond to slices 
cut in the surface of the top row. These plots depict recognition rate at a specific 
number of steps over exponent s. For example, figure (d) shows the recognition 
rate after the 2nd step for different algorithms indexed by exponent s. Exponent 
s has been varied from 0..4. The measured data points and their error-bars are 
included in the plots. Each of the objects has been placed on the turn-table at 
each of its initial poses and 30 test runs have been performed for every possible 
initial condition. 
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Figure 9.18: Planned Runs, Objects 07, 0g, 30 Steps, Fusion F, = eae Results 


of experiments with a conjunctive fusion operator as suggested by Schweizer and 
Sklar, eq.s (8.30,8.31). The figures in the top row depict the surface of recognition 
rate over number of steps and exponent s. For visualization purposes the same 
surface is depicted from different view-points. The plots below correspond to slices 
cut in the surface of the top row. These plots depict recognition rate at a specific 
number of steps over exponent s. For example, figure (d) shows the recognition 
rate after the 2nd step for different algorithms indexed by exponent s. Exponent 
s has been varied from 0..4. The measured data points and their error-bars are 
included in the plots. Only objects o7 and og have been placed on the turn-table at 
each of their initial poses and 30 test runs have been performed for every possible 
initial condition. 
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9.5 Disjunctive Active Fusion ? 


While product fusion outperforms all other considered conjunctive fusion schemes, none 
of those conjunctive approaches has turned out to be absolutely inappropriate for active 
fusion purposes. This changes as we turn to disjunctive fusion. 


Figure 9.19: Planned Runs, Objects 01,..,03 and 07,0g, 30 Steps, Fusion maz(a,b). Re- 
sults of experiments with maximum fusion. The figures of the first row depict 
failure rate (a), the rate of correct but ambiguous results (b) and the rate of clearly 
unambiguous and correct results (c). The plots of the second row display the same 
quantities, the only difference being that only objects 07 and og are put on the 
turn-table. 


In order to demonstrate the problem with disjunctive fusion let us discuss the per- 
formance of the maximum combination rule (see eq.s (8.38) and (8.39)). The results are 
depicted in figure 9.19. Contrary to the other plots that we have depicted so far, in figures 
9.19(a) and 9.19(d) we display failure rate instead of recognition rate. The failure rate 
(i.e. the number of selections of the wrong object) gets extremely small using disjunctive 
fusion. It is indeed lower than the failure rate for probabilistic fusion. Nevertheless dis- 
junctive fusion schemes face a severe problem. As can be read off from figures 9.19(b) and 
9.19(e) the number of ambiguous object hypotheses (with multiple hypotheses obtaining 
the same maximum value) gets very large when active fusion is based upon maximum fu- 
sion. This stems simply from the fact that for each hypothesis the best result is collected. 
Since we are using a very ambiguous low dimensional object representation it happens 
frequently that one of the wrong hypotheses also gets a high confidence value. Disjunc- 
tive fusion accumulates these confidences and is unable to sort out the correct hypothesis. 
This is also visible in figures 9.19(c) and 9.19(f) where recognition rate is depicted for only 
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those cases in which the system has recognized the correct object without also delivering 
wrong hypotheses that are assigned an equally high confidence. Clearly the rate of correct 
and unambiguous outputs gets very low as more observations are made. This behavior is 
totally unwanted for active fusion purposes. Indeed, for a fair comparison with other fu- 
sion schemes only figures 9.19(c) and 9.19(f) should be used to measure recognition rate. 
Doing so we come to the conclusion that disjunctive fusion is inappropriate for active 
fusion purposes. 


9.6 Averaging Active Fusion 


We have seen that disjunctive fusion cannot integrate results correctly because they allow 
too many positive hypotheses to influence the final result. On the other hand conjunc- 
tive fusion schemes perform very well in extracting the few decisive observations. But we 
should also expect that conjunctive schemes are sensitive to outliers because they strongly 
depend on each single negative result. Averaging approaches to information integration 
are a sort of compromise between conjunctive and disjunctive fusion [24]. 


Here we describe the recognition results that have been obtained in test runs using 
the family of weighted averages defined through eq.s (8.34) and (8.35), and the averaging 
scheme given by eq.s (8.36) and (8.37). 


It is well known, that one strength of averaging aggregation operators is their inher- 
ent capability to suppress outliers due to their compromising behavior [67]. We have 
performed two sets of runs: one at zero outlier rate while the other study includes an 
increasing number of outliers. By “outlier”, we understand a failure of the object-pose 
classification algorithm. Since we have studied active fusion in a quite constrained envi- 
ronment it has been possible to achieve a negligible rate of outliers for all the experiments 
reported so far. But failures may obviously arise for a variety of reasons, like severe seg- 
mentation errors, instabilities in lighting etc. We have simulated these effects by randomly 
distorting the extracted feature vector in a certain percentage of the observations. 


Let us begin with the family of weighted averaging rules at zero outlier rate. The 
results obtained with this family of fusion schemes are depicted in figures 9.20 and 9.21. 
We note that if there are no outliers: 


e Averaging schemes do not reach the recognition rates of conjunctive approaches 
even though they come close to it in very long runs. 


e A “balanced” averaging scheme (w * 1) performs best during the first few steps 
but in the long run it is better to be conservative (w > 1). 


This latter finding may be explained by the fact that in the long run the right hypotheses 
will have received higher confidences on average and it will usually be better not to let 
new contradictory observations affect the outcome of the fusion scheme dramatically. We 
find no qualitative differences if the experiments are performed only with objects 07 and og. 
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Things turn more interesting if we increase the rate of erroneous observations. In 
order to be able to study the effect of wrong recognition results we have artificially in- 
creased the percentage of outliers (0%..60%). We have compared the behavior of both 
the probabilistic product fusion scheme (“product rule”) given by eq.s (8.5)-(8.6) and 
the averaging scheme (“sum rule”) defined through eq.s (8.36)-(8.37) at increasing outlier 
rates. The results of this comparison are depicted in Fig. 9.22. 


The product rule behaves well up to a rate of around 10% of mis-classifications 
(Fig. 9.22b). After that point severe fusion is not able to recover fully from wrong results. 
In case 60% of the observations correspond to outliers the system is not able to recover 
at all until the next outlier is encountered (Fig. 9.22d). Consequently, the fused results 
get worse than the original result obtained from a single observation. 


On the other hand, the system using the sum rule needs longer to recognize the object 
if only a few outliers are present. However, as the rate of mis-classifications increases 
the system robustly continues to improve recognition rate through multiple observations. 
Even with 60% outliers it reaches a final recognition rate above 80%. This result occurs 
because we have assumed mis-classifications to be due to random errors. Hence, in the 
sums in eq. (3.11) and eq. (3.12) the correct hypothesis will be favored in about 40% of 
the cases while the contributions in the remaining 60% of the cases are equally distributed 
over the wrong hypotheses. This general tendency of increased resistance against outliers 
will also be present for systematic errors. 


We conclude that the sum rule provides a robust fusion scheme in the presence of out- 
liers which clearly outperforms the product rule as the outlier rate lies beyond + 10%. On 
the other hand, the sum rule is not a “perfect” solution for active fusion purposes. Fusion 
by averaging is less effective because it necessarily treats “decisive” singular observations 
like outliers. Consequently more steps are needed to reach a recognition rate similar to 
the rate obtained by the product rule in case the outlier rate is low. 


We can make another very interesting observation. In Fig. 9.23 we compare the recog- 
nition rates obtained with the robust sum rule in planned runs with the rate achieved for 
runs assuming random motions. Again the rate of outliers increases from 0% — 60%. The 
plots show clearly that the effectiveness of planning is negatively correlated with the rate 
of outliers. The system gracefully degrades in performance improvements due to planning 
as the outlier rate increases. Once the rate of outliers exceeds a (relatively high) threshold 
(somewhere above 30% in our case) planning becomes virtually useless and the system 
is no longer able to outperform a system making random choices to any significant extent. 


Note, that the planning module is thus more easily and more severely affected by in- 
creased outlier-rates than the fusion module. In our experiments averaging fusion achieves 
remarkable recognition rates even at outlier rates around 60% while the improvements due 
to planning become negligible. We consider these results to be symptomatic for active 
fusion in general: Planning the next action cannot be effective unless the outcome of that 
action can be estimated with some certainty. In a highly random environment any move 
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may be as good as any other. This experimental observation clearly shows one of the sine 
qua non’s for active fusion: while fusion alone can be made robust, the active planning 
part is much more strongly dependent on a predictable environment. 


9.7 Different Measures of Imprecision 


Having studied the behavior of the algorithm for different fusion schemes and various 
uncertainty calculi we would now like to focus on the effects due to the use of different 
measures of imprecision. 


Measures of Numerical Imprecision 


We begin this section with the discussion of some test cases in which three measures of 
numerical imprecision have been compared within the probabilistic model (see eq.s (8.11)- 
(8.13)). 


As mentioned in section 9.2.2 we are using an extended set of objects in these exper- 
iments in order to demonstrate clearly the differences between the considered measures 
of numerical imprecision. The test runs that we discuss in this section differ also in an- 
other aspect from the experiments undertaken so far. We will consider the same runs for 
eigenspaces of increasing dimensionality (3,5,10). By performing such extended experi- 
ments we will be able to give an experimental demonstration of the persistence of effects 
which are due to true symmetries of the objects. If, however, symmetries only seem to 
occur (because the internal representation is too coarse and makes different views appear 
similar) the relevant effects will be observed only for low dimensional feature spaces. 


For each of the objects 09..014 characteristic test results are depicted in the figures 
9.24..9.28. The resulting new pose and the fused pose estimation is depicted for every 
third step in a sequence of 30 steps (confidence vs. angle). These figures share a common 
scale on the horizontal axis (angle) but do not share a common scale on the vertical axis 
(confidences). Since the distributions are sum-normal the absolute area between hori- 
zontal axis and the plot of confidence values is identical for all the figures’. All pose 
estimations are depicted for the same angular reference system as explained in section 
3.3.1. Thus success of the algorithm can be judged by the development of a peak at the 
correct pose. 


On the bottom of each figure the values of the three discussed measures 07, H,2, 07 
of numerical imprecision (see section 8.1.3) are shown for each of the 30 steps (numerical 
imprecision vs. step). In these figures the scales at the horizontal axis (30 steps) and the 
vertical axis (0..1) are always constant. 


We continue by discussing the results obtained for each of the objects separately. Let 
us begin with a case where all three measures of numerical imprecision perform well. 


®The scale on the vertical axis can be derived from this observation. 
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(j) Step 20 (k) Step 25 (1) Step 30 


Figure 9.20: Planned Runs, Objects 01,..,03, 30 Steps, Fusion a” + b. Results of experi- 
ments with fusion based upon the weighted average, eq.s (8.34),(8.35). The figures 
in the top row depict the surface of recognition rate over number of steps and 
weight w. For visualization purposes the same surface is depicted from different 
view-points. The plots below correspond to slices cut in the surface of the top row. 
These plots depict recognition rate at a specific number of steps over weight w. 
For example, figure (d) shows the recognition rate after the 2nd step for different 
algorithms indexed by weight w. The weight w has been varied from 0..4. The 
measured data points and their error-bars are included in the plots. Each of the 
objects 01,..,0g has been placed on the turn-table at each of its initial poses and 
30 test runs have been performed for every possible initial condition. 
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(d) Step 2 (e) Step 3 (f) Step 4 


(g) Step 5 (h) Step 10 (i) Step 15 


(j) Step 20 (k) Step 25 (1) Step 30 


Figure 9.21: Planned Runs, Objects 07,08, 30 Steps, Fusion a+b. Results of experiments 
with fusion based upon the weighted averaging operation, eq.s (8.34),(8.35). The 
figures in the top row depict the surface of recognition rate over number of steps 
and weight w. For visualization purposes the same surface is depicted from different 
view-points. The plots below correspond to slices cut in the surface of the top row. 
These plots depict recognition rate at a specific number of steps over weight w. 
For example, figure (d) shows the recognition rate after the 2nd step for different 
algorithms indexed by weight w. The weight w has been varied from 0..4. The 
measured data points and their error-bars are included in the plots. Only objects 
07 and og have been placed on the turn-table at each of their initial poses and 30 
test runs have been performed for every possible initial condition. 
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Results for Object 09, Figure 9.24 


The resulting pose estimations for object og are depicted in figure 9.24. The first inter- 
esting conclusion to be drawn is that the Japanese writing on the frontal side of the cup 
is not captured by a three dimensional eigenspace. All feature vectors corresponding to 
different views of the cup are located in a very small neighborhood. They cannot be dis- 
ambiguated with such a low number of eigenvectors as can be seen from the fact that even 
after 30 steps no pronounced peak can be observed. The variation of confidence values 
for the fused poses never gets bigger than 0.01. It is, however, interesting to observe that 
the tiny beginnings of a peak are indeed located around the right position (y = 250°) 
even for the three dimensional classifier. The test run exemplifies a situation where active 
fusion cannot really help because the classifier is just not able to disambiguate between 
different poses. Observing the object from a different view-point cannot improve such a 
situation. 


Increasing the dimensions of the eigenspace to five makes the problem solvable with an 
active fusion approach. It should be observed that each single new pose estimation still 
appears to be rather ambiguous. However the poses around y = 250° nearly always obtain 
a slightly higher confidence®. It is especially instructive to compare the first rows in figure 
9.24 where the system has captured the image corresponding to y = 250°. The “plateau” 
of confidence values corresponds to those views that are virtually indistinguishable from 
the view y = 250°. Only for y & 0° a “valley” of confidence values is found because 
of the Japanese writing. Because subsequent observations are consistent the fused pose 
estimation develops a peak at the pose y = 250°. The same is true for the eigenspace of 
dimension 10. Note also that the assumed errors in positioning and lighting are rather big 
in these experiments (displacements of 15 pixels were used for the learning data). Hence 
the pose cannot be estimated as accurately as before but the peak of the distribution is 
still located at the correct position. 


In the runs performed with the ten-dimensional eigenspace the correct pose has been 
clearly identified and even the individual new pose estimations almost consistently favor 
the correct pose. 


When comparing the measures of imprecision we find that 0? and H,2 give quantita- 
tively very similar results while c? covers a bigger range of values. 


The pose estimations are extremely imprecise in the 3d case. 0? stays very close to the 
value 2/3 which is obtained for uniform distributions’. In accordance with the discussion 
in chapter 7 the variance o? does not obtain its maximum value 1 in this case while the 
other two measures stay very close to the maximum value 1. The fused poses have an 
extremely slight tendency to develop a peak around y = 250° (The difference between 


®When evaluating the results it should not be forgotten that observations from different camera posi- 
tions all contribute to y = 250° since we transform the pose estimation to the common reference system. 

"The scale on the vertical axis of the figures for ¢? and H,2 does not permit to resolve the fine 
variations that can be observed in the figures of the distributions of confidences where the scale has been 
adapted dynamically to enhance the details. 
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the minimum and maximum confidence value is less than 0.01). During the evaluation of 
g? the distribution is max-normalized which makes it very sensitive to variations around 
the uniform distribution®. 


In the 5d eigenspace scenario o? and H,: still fluctuate around values close to the max- 
ima (2/3 and 1) for new pose estimations. The recognition of the correct pose is reflected 
in the pronounced decrease of numerical imprecision. The values obtained after 30-steps 
are 0.096 and 0.103 respectively. Measure c? again reflects the larger fluctuations visible 
in the figures of the confidence values. Its value for the new pose estimations fluctuates 
around 0.8. Thus only cg? captures the difference in the distributions of confidences for 
the new poses between the 3d and the 5d case. For the fused poses we obtain a value of 
o? = 0.003 after 30 steps. This value can no longer be resolved on the scale chosen for 
the figures of measures of numerical imprecision. 


Qualitatively very similar results are obtained as the eigenspace gets ten-dimensional. 
Numerical imprecision gets lower even more quickly but again a? and H,2 achieve rela- 
tively high values when compared to G?. 


Results for Object 019, Figure 9.25 


Object 049 has been included to check the consistency of the approach. Clearly no improve- 
ment can be achieved by active fusion because the object’s different poses are de-facto 
indistinguishable. It is interesting to note that ¢? follows the extremely tiny variations 
that develop in the distributions of confidence values after about 10 steps. These ar- 
tifacts are negligible (order of magnitude 10~*) and void of any physical significance. 
Consequently for ¢? the range of values ~ 0.8..1 should not be considered to reflect true 
improvements of the pose estimation but rather only statistical fluctuations. This is an- 
other example of the extreme sensitivity of c? to deviations from the uniform distribution. 
Since the goal of action planning for the pose estimation module is to get away from uni- 
form distributions the sensitivity of a? is actually a positive feature. By using co? large 
differences in numerical imprecision will be obtained between the uniform distribution 
and the best non-uniform distribution to be expected for the next move. 


Results for Object 0;,, Figure 9.26 


While in the above examples none of the measures behaves counter-intuitively this is no 
longer true for object 0,;;. The pose of object 0;; cannot be determined unambiguously 
because of the rotational symmetry of the object. For each pose there is an identical 
view at an angular difference of 180°. By fusing different observations the system is able 
to extract the correct pose hypothesis modulo 180° which is the best result that can be 
achieved in such a situation. 


8The effect of renormalization is also visible in all distributions of confidence values depicted in figure 
9.24 since we have adapted the scale on the vertical axis to the maximum confidence value. Thus measure 
&? is the appropriate measure if one tries to analyze the distributions at the dynamic scale we have chosen 
for visualization. 
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The measures of imprecision display strongly differing behaviors in the considered 
case. Numerical imprecision as judged by o? increases for the fused pose estimations, 
even though the pose gets more and more precise (though obviously only modulo 180°). 
The increase of o? in this case is due to the development of two peaks. As two sharply 
peaked confidences for two hypotheses at an angular distance of 180° emerge o? obtains 
its maximum value of 1. The effect of the active fusion process as judged through o? 
should thus be an increase in numerical imprecision. For object recognition purposes this 
judgment runs against common sense. The exemplified behavior is clearly unwanted for 
active fusion tasks. H,2 has been introduced to fix o?’s shortcoming for symmetric objects 
and is thus decreasing. However, only a? displays a satisfactory decrease to low values. 


Why bother ? After all, we can see that the system is still able to obtain the correct 
results even for increasing 0”. However, the correct pose is obtained solely because fusing 
different pose estimates works properly while o?’s shortcoming severely affects the action- 
planning phase. If view-planning is based upon o? the worst instead of the best action is 
chosen. As the object-pose classifier and the fusion module work properly the failure of 
the planning module does not lead to a failure of the whole algorithm. Nevertheless, our 
main emphasis lies on how to zmprove the fusion methodology by active planning and we 
feel greatly disturbed by a measure of numerical imprecision which makes the planning 
phase not only useless but counter-productive. 


Results for Object oj, Figure 9.27 


For object 01, the bi-modal distribution of the confidences for the new pose estimation is 
due to the physical symmetry of the object. With object 0,2 we encounter another case in 
which small details are not very well distinguished because of the great variatons among 
the input images. Consequently the distribution of new confidence values for object 012 
shows two peaks: One at the correct pose y = 250° and one at yw = 250° — 180° = 70°. 
Both poses are side-views of the car. Since the correct pose consistently obtains slightly 
higher values for the new pose estimations we finally obtain a distinct peak in the fused 
pose estimation. 


The most interesting aspect of this example is the behavior of the measures of nu- 
merical imprecision for the new pose estimation. Since the new poses display two peaks 
at maximal distance, o? is again higher than it would be for uniform distributions. This 
is not true for H,2 but the value is still very close to the maximum value 1. Only c? 
differs remarkably and shows clearly the difference between the uniform distribution and 
the bi-modal distribution. 


For the fused pose estimations all three measures give decreasing values. Since the 
pose estimation develops a remarkable peak for all dimensions of the eigenspace the values 
obtained by 6? are very small and below the resolution of the plots. 
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Results for Object 013, Figure 9.28 


The handle of object 013 is a much more distinct feature than the writing on the Japanese 
cup. Even with a 3d eigenspace we obtain a reasonable pose estimate. The distributions 
in the first line of the figure are obtained at step 0, i.e. for y = 250°. Two plateaus can 
be observed: one around the correct pose y = 250 and the other around poses from the 
front of the object y = 100°. Thus the system is not able to recognize the handle if it 
points towards the camera. For poses around y = 0° and y = 180° the handle can be 
seen from the side. In the top-row distributions the confidences for these poses are clearly 
lower, indicating that the system can distinguish between these poses and the frontal or 
back views. The fact that the correct pose can be obtained after various fusion steps is 
thus only due to the motions of the camera and the consistent observations for the correct 
pose estimation. A static scenario would only be able to tell apart frontal and rear views 
from side views. 


In this example all three measures reflect the decrease in numerical imprecision of the 
fused pose estimation in an intuitively pleasing way. 


Results for Objects 0,4 in figure 9.29 


In the test runs for object 014 we see once again that the handle of the cup allows for a 
correct pose estimation. The mean is nearly around y = 250° but the standard deviation 
is still about 25°. As eigenspace dimensions increase the final distributions get more nar- 
rowly peaked around y = 250°. 


When turning to the measures of numerical imprecision we can see that all measures 
reflect the improvement of the fused pose estimation but - at the chosen scale - only the 
plots for a? also display the improvements of new pose estimations. 


Which Measure is Best ? 


In most cases all three measures of numerical imprecision give satisfactory results. How- 
ever, our “cabinet of curios objects” has revealed some interesting differences of the three 
approaches. 


If nearly symmetric objects (object 012 because of the internal representation) or fully 
symmetric objects (object 0;,) are present in the database, 0? may increase even though 
the sum normal distribution of confidence values for the pose gets less uniform. This 
feature is unwanted in active fusion applications. 


Consequently we have proposed to use H,2, a combination of Shannon entropy and 
variance. H,2 decreases because the value of o? is multiplied with the value of entropy 
(which decreases more quickly than o? increases). Furthermore H,2 is a differentiable 
measure of numerical imprecision. It can therefore also be applied easily in gradient 
based optimization algorithms. The disadvantage of H,2 is that evaluating this measure 
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requires more steps because both entropy and variance have to be evaluated separately. 


Measure G6? behaves nicely in all possible configurations. It is highly sensitive towards 
deviations from the uniform distribution but this feature makes the measure even more 
attractive because it contributes to an effective action planning strategy. As the distribu- 
tion gets less uniform a? decreases even if two peaks at opposite views evolve. The formal 
simplicity of ¢? makes it the candidate of choice for measuring numerical imprecision in 
active object recognition. However, if differentiability of the measure should be ensured 
it may be preferable to choose H,2 instead. 


How important is it to choose the right measure of imprecision ? After all we usually 
do not expect too many truly symmetric objects in our database. To answer this question 
let us again repeat that many (not only man-made) objects display “pseudo-symmetries” , 
i.e. they appear similar (though not identical) from various views. As the representational 
power of the feature space decreases such views have a good chance to be represented by 
more or less identical feature values. This is quite often only a question of scale and res- 
olution of the representation. We have seen some examples in the described experiments. 
Thus it turns out that the question is not only whether we have truly symmetric objects 
in our database but also whether our feature space is able to capture enough details or 
if “pseudo-symmetries” appear like true symmetries to our system. The latter case can 
become abundant for the eigenspace representation if large object databases are used 
while only a comparatively small number of eigenvectors can be computed. We expect 
this phenomenon to occur in general also for other low dimensional feature spaces. Hence 
we conclude that using the right measure of numerical imprecision might actually be a 
very important ingredient to successful large scale implementations of the active fusion 
approach. Surely it is only the action planning stage which gets ruined by the use of a 
wrong measure of numerical imprecision. But again, it appears very counter-productive 
to the active fusion paradigm to allow for failures of the action planning routine. Or- 
dinary static multi-sensor scenarios may well outperform the active fusion system if the 
view planning module is implemented sloppily. 


Before concluding this discussion let us note that c? and o? coincide for max-normal 
distributions. The above discussion is therefore mainly addressing the issue of measuring 
numerical imprecision in probability theory (or any other theory dealing with sum-normal 
confidences). It can be foreseen that researchers with probabilistic models will rather 
tend to stick to the variance (or standard deviation) for measuring numerical imprecision. 
Therefore we consider the given example for the breakdown of action planning when using 
the variance (object 0;,) to be an important contribution towards a better understanding 
of what properties define a good measure of accuracy for active fusion purposes. 


9.8 Comparing Different Measures of Non-Specificity 


In chapter 6 we have reviewed and introduced various measures of non-specificity. The 
discussion there has focused on covering as many theoretically conceivable cases as pos- 
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sible. For example, we have introduced measures of non-specificity for general fuzzy sets 
that are not subject to any constraints like max-normalization or sum-normalization. By 
keeping the discussion general we have also provided a starting point for other researchers 
in active fusion who will certainly also face the problem of how to quantify non-specificity 
within a chosen uncertainty calculus. 


In chapters 8 and 9 we have considered a few representative implementations of our 
active fusion algorithm. These exemplary implementations can necessarily include only 
some of the most prominent approaches. For instance, in our actual experiments we have 
only been using either max-normal or sum-normal distributions. For such constrained dis- 
tributions measures of non-specificity have already been introduced by other researchers 
and we have discussed the most prominent measures in chapter 6. While we consider the 
soundness of our approach to be already well established by the arguments put forward in 
chapter 6, we use this opportunity to provide also experimental evidence supporting our 
new measures of non-specificity. In particular we will show here that the new measure H3 
defined through eq. (8.26) is in fact a valid estimate of non-specificity. 


To this end we have performed experiments using three different measures of non- 
specificity during planning in the possibilistic implementation. Figure 9.30 depicts recog- 
nition rate at each step when planning is based upon Lamata and Morals measure 
eq. (8.24), Kaufmann entropy eq. (8.25) and our newly defined measure eq. (8.26). 


A few negligible performance differences can be observed in runs where any object 
from the complete database of models can be put on the turn-table. The approaches 
deliver different results only when it comes to disambiguate very similar objects. In this 
case Kaufmann entropy is outperformed by both Lamata and Morals measures and by our 
proposal. When directly comparing Lamata and Morals measure to our measure it can 
be observed that planning based upon our measure performs better in the long run but is 
beaten for usual numbers of observations (< 10). Since the planning phase - when based 
upon Kaufmann entropy - performs so much worse for similar objects it must perform 
much better for other objects to explain the more or less equal performance of all three 
implementations on the complete data-base. 


All measures have specific advantages and drawbacks. The important conclusion in 
the context of this thesis is that in those cases in which our new measures can be compared 
to existing measures we find that they are indeed very good estimates of non-specificity. 
The same reasoning which has lead us to the introduction of eq. (8.26) has also provided 
support for other more general measures of non-specificity in chapter 6. These measures 
can be applied beyond the realms of probability theory and possibility theory. The positive 
results obtained with eq. (8.26) constitute one particular experimental demonstration 
of the validity of the overall approach underlying the introduction of new measures of 
imprecision in chapter 6. 


9.9. DISCUSSION 143 


9.9 Discussion 


We have presented experimental results for various different implementations of an active 
object recognition system for single object scenes. Depending on the amount of ambiguity 
in the current object classification the recognition task acquires new sensor measurements 
in a planned manner until the confidence in a certain hypothesis obtains a pre-defined 
level or another termination criterion is reached. The uncertainty of the classification 
results has been modeled by probability theory, possibility theory, evidence theory and 
fuzzy logic. The use of uncertain object classifications (instead of hard decisions) leads 
to a natural formulation of view planning which is based on the expected increase in 
recognition quality gauged by entropy like measures. Using this strategy the system 
successfully identifies those regions in eigenspace where the manifold representations are 
well separated. The experimental results lead to the following conclusions: 


1. Even objects sharing most of their views can be disambiguated by an active move- 
ment that places the camera such that the differences between the objects become 
apparent. 


2. The planning phase is necessary and beneficial as random placement of the camera 
leads to distinctively worse results. The obtained results can also be used to compare 
our approach to a static multi-camera setup. A static system is not able to perform 
the right movement already at the beginning of the recognition sequence but rather 
has to hope that it will capture the decisive features with at least one of the cameras. 
We have seen that using a random strategy the system always needs significantly 
more steps to reach its final recognition level. This fact translates to the assertion 
that a multi-camera system with randomly but statically placed cameras will on the 
average need a correspondingly high number of cameras to obtain a recognition rate 
comparable to our active, single camera system for the used set of objects. 


3. The classifier for a single image can be simplified if multiple observations are allowed. 
In our experiments this is reflected by the fact that the dimension of the eigenspace 
can be lowered considerably. A recognition rate of 98% using only three eigenvectors 
for a model data-base like ours has not been reported before. We expect this result to 
imply that using eigenspaces of conventional sizes (= 30 eigenvectors) the number 
of distinguishable objects can lie far beyond the currently reported maximum of 
approximately 100 objects. 


4. Probability theory, possibility theory and evidence theory are all well suited to 
implement the active fusion system. Our results favor the probabilistic approach. 
When no planning step is used the approach based on Dempster-Shafer theory out- 
performs the possibilistic implementation after a few number of steps. With view 
planning switched on the possibilistic version outperforms the Dempster-Shafer ap- 
proach and reaches the same maximum average performance as the probabilistic 
implementation. Still, the probabilistic approach attains this recognition rate faster 
than the others. We interpret the success of the probabilistic scheme in our ex- 
periments to be due to the fact that the estimated likelihoods p(glo;,y;) are quite 
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reliable and thus Bayesian reasoning is the well founded and most consistent scheme 
for further processing. 


5. Conjunctive fusion schemes perform well in active recognition while disjunctive 
schemes fail to integrate results correctly. If a considerable number of errors are 
to be expected at the recognition stage severe fusion schemes may be replaced by 
a more cautious approach like fusion by averaging. The experimental comparison 
indicates a significant increase in resistance against outliers at the cost of reduced 
recognition speed if the product rule is replaced by the sum rule in the probabilistic 
implementation. Unfortunately, fusion based upon the sum rule is not only insensi- 
tive to outliers but also to “decisive correct observations” that clearly outrule certain 
hypotheses. Thus the sum rule is sub-optimal at low error rates. 


6. The robustness of an active fusion algorithm is mainly due to the choice of the 
proper robust fusion operator and not so much to the active planning module. This 
has been established clearly by comparing the behavior of active systems using the 
same planning algorithm but different fusion operators, like the sum rule and the 
product rule. Nevertheless, the effectiveness of planning degrades gracefully as the 
rate of outliers increases, given that robust, averaging fusion schemes are employed. 


7. As the outlier rate increases, a point is reached from which onward planning cannot 
be effective any more. This happens because the system can no longer reliably 
predict the outcome of the next move. We expect this observation to indicate a 
breakdown of all active fusion approaches in the presence of unknown or rather 
unpredictable environments. 


8. Measuring numerical imprecision of the pose estimation by standard deviation may 
run into difficulties for uncertainty calculi based upon sum normal distributions 
of confidences (such as probability theory). The problem being that standard de- 
viation does not assume its maximum value for uniform distributions while such 
distributions should be considered to represent the most imprecise result that can 
be obtained. New measures have been proposed which perform well even in those 
cases in which standard deviation fails to capture the notion of numerical impreci- 
sion correctly. 


9. The soundness of our approach to introducing new measures of non-specificity has 
been demonstrated by comparison with existing estimates of imprecision. 


We have now closed the circle of arguments that has been initiated in chapter 3 with the 
presentation of our general active object recognition scheme. We have not only provided 
deeper insights for important issues in active fusion (like how to measure imprecision) 
but also shown theoretically and experimentally that our active fusion algorithm allows 
for different successful realizations. The following chapter provides a critcal discussion of 
what has been achieved, and how the experiences we have been able to gather with the 
presented model can give us an indication of what can be expected for the future. 
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Figure 9.22: Recognition rate versus active fusion steps when the classification module produces 


a certain percentage of wrong results (outliers) during planned runs. The plots in 
the first row depict the average recognition rate using the product fusion scheme, 
eq.s (3.8)-(3.10). The plots in the second row depict the obtained recognition rate 
using an averaging fusion scheme, eq.s (3.11)-(3.12). The percentage of wrong 


classifications (outliers) increases from left to right. 
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Figure 9.23: Recognition rate versus active fusion steps when the classification module produces 
a certain percentage of wrong results (outliers). The plots in the first row depict 
the average recognition rate if action planning is switched on and the sum rule 
eq.s (3.11)-(3.12) is used for fusion. The plots in the second row depict the obtained 
recognition rate if actions were selected randomly and the same sum rule is used for 
fusion. The plots in the third row depict the difference between the corresponding 
plots in the first and second row. The percentage of wrong classifications (outliers) 
increases from left to right. 
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Figure 9.24: Resulting pose estimations during representative test runs. Object og is placed on 


the turn-table at y = 250°. The experiment has been repeated with eigenspaces of 
3, 5 and 10 dimensions. During each test run the system performs 30 steps. The 
new pose estimation and the fused pose estimation are depicted. The corresponding 
estimates of numerical imprecision at each of the 30 steps are depicted in the three 
bottom rows. 
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Resulting pose estimations for object 019 
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Figure 9.25: Resulting pose estimations during representative test runs. Object o19 is placed on 
the turn-table at y = 0°. The experiment has been repeated with eigenspaces of 3, 
5 and 10 dimensions. During each test run the system performs 30 steps. The new 
pose estimation and the fused pose estimation are depicted. The corresponding 
estimates of numerical imprecision at each of the 30 steps are depicted in the three 
bottom rows. 
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Resulting pose estimations for object 011 
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Figure 9.26: Resulting pose estimations during representative test runs. Object 011 is placed on 


the turn-table at y = 90°. The experiment has been repeated with eigenspaces of 3, 
5 and 10 dimensions. During each test run the system performs 30 steps. The new 
pose estimation and the fused pose estimation are depicted. The corresponding 
estimates of numerical imprecision at each of the 30 steps are depicted in the three 
bottom rows. 
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Resulting pose estimations for object 012 
Step || New pose 3d | Fused pose 3d || New pose 5d | Fused pose 5d || New pose 10d | Fused pose 10d 
0 : : 
3 
Mel te Lol lll Hal hn cc lll, al 
6 
9 ; 
12 
allo bell ae Ih, cal 
15 
ht bef hn 
18 
21 Mil Inia Ih. al 
24 
i thy bel Lh tat 
27 
30 
i s AMM Th 
2 ? 
or Mie Ul 
HH, i — 
~2 : : 
be Miva el — ‘Ef Onfimf fh aL 
Figure 9.27: Resulting pose estimations during representative test runs. Object o12 is placed on 


the turn-table at y = 250°. The experiment has been repeated with eigenspaces of 
3, 5 and 10 dimensions. During each test run the system performs 30 steps. The 
new pose estimation and the fused pose estimation are depicted. The corresponding 
estimates of numerical imprecision at each of the 30 steps are depicted in the three 
bottom rows. 
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Resulting pose estimations for object 013 
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Figure 9.28: Resulting pose estimations during representative test runs. Object 013 is placed on 
the turn-table at y = 250°. The experiment has been repeated with eigenspaces of 
3, 5 and 10 dimensions. During each test run the system performs 30 steps. The 
new pose estimation and the fused pose estimation are depicted. The corresponding 
estimates of numerical imprecision at each of the 30 steps are depicted in the three 
bottom rows. 
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Resulting pose estimations for object 014 
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Figure 9.29: Resulting pose estimations during representative test runs. Object o14 is placed on 


the turn-table at y = 250°. The experiment has been repeated with eigenspaces of 
3, 5 and 10 dimensions. During each test run the system performs 30 steps. The 
new pose estimation and the fused pose estimation are depicted. The corresponding 
estimates of numerical imprecision at each of the 30 steps are depicted in the three 
bottom rows. 
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Figure 9.30: Results of planned experiments (30 steps) with the possibilistic approach using dif- 
ferent measures of non-specificity. The first row depicts recognition rate at each step 
when planning is based upon Lamata and Morals measure eq.(8.24) (a), Kaufmann 
entropy eq.(8.25) (b) and our newly defined measure in eq.(8.26) (c). Recognition 
rate is averaged over runs where all objects are put on the turn-table at all possible 
poses. The second row depicts the differences in recognition rate between the three 
approaches. The third and fourth row display the same quantities that are depicted 
in the first and second row when only the very similar objects o7 and og are put on 
the turn-table at all possible initial poses. 
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Chapter 10 


Critical Discussion and Outlook 


In this thesis one exemplary active fusion algorithm for object recognition has been studied 
both theoretically and experimentally. Encouraging experimental and theoretical results 
have been obtained. The active object recognition system has clearly outperformed al- 
ternative static approaches. Both the information fusion and the action planning module 
have turned out to provide significant improvements over conventional systems relying on 
single observations. 


As stated in the introduction to this thesis, some hopes are connected to the applica- 
tion of active fusion in object recognition. The method is “expected to lead to reliable 
results at reasonable costs as compared to a ‘brute force’ combination of all available 
data sources” [103]. While we have been able to confirm many of our expectations (see 
especially the discussion in section 9.9) we have not yet given a critical discussion of the 
taken approach. This final chapter is therefore devoted to such a critical review in light 
of the findings reported in this thesis. 


In particular we would like to address some of what we feel are the most important 
questions regarding the active fusion paradigm. 


Is Activity Necessary for Recognition ? 


When discussing our experiments we have given an example that cannot satisfyingly be 
solved by a single observation at all: Objects 07 and og look identical apart from the 
marker on og. So clearly, there are some cases where multiple observations are absolutely 
necessary. One way to achieve this is to actively change the view-point!. 


Does this mean that activity is really necessary for recognition ? Biological vision is 
always coupled to an actuator. But the reason for this is probably that it is easier to 
survive if one is able to move and not that it is necessary to move in order to understand 
visual inputs. Quite on the contrary, human beings are usually able to interpret pictures 
containing scenes without moving at all. In a very similar discussion Biederman has ar- 
gued about edge detection that “biological vision thus furnishes an existence proof and, 


‘Another approach might consist in a static multi-camera setup. 
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therefore, an inspiration to attempts at achieving edge extraction by machines. From this 
perspective, the appeal to stereo and motion by the ‘active vision’ community represents 
a premature admission of defeat” [43]. It is not unlikely that Biederman would apply the 
same comment to the complete object recognition process. 


From this perspective, it would indeed be interesting to perform some research on 
how often human beings encounter cases in which they have to change their view-point 
to achieve a reliable recognition rate. Personally, we feel secure in expecting that in the 
majority of conceivable situations activity will not be necessary for recognition. 


But what about the findings of Aloimonos and others that ill-posed problems can be 
solved by active vision ? One should not overlook the fact, that the ‘active vision’ com- 
munity only shows that analyzing static scenes without prior knowledge is ill-posed. It has 
been established beyond doubt, that observing the same scene from different viewpoints 
can make various problems related to the reconstruction and/or recognition of objects 
well-posed even if one does not possess any additional knowledge. The fact, that human 
beings are able to solve ill-posed vision problems without observing the scene a second 
time, indicates that there are other solutions to solve the dilemma. In fact, we consider 
it to be very likely that additional knowledge about the appearance of objects and mate- 
rials guides the brain in choosing the correct solutions when interpreting pictures. Active 
vision can provide solutions to ill-posed problems but it is certainly not the only way to 
deal with these problems. 


The active fusion paradigm comprises also the idea of soft-actions, i.e. planned top 
down loops during the classification of a single image. The question whether such loops 
really occur during biological vision is very hard to answer. Some considerations on the 
time needed by human beings to interpret visual input seem to imply that there cannot 
be elaborate planning strategies involved unless planning and re-processing could be per- 
formed at improbably high speeds. 


To sum up, we think that strong arguments can be put forward to substantiate the 
hypothesis that (in the majority of possible cases) activity is not absolutely necessary 
for recognition. But we are obviously not only interested in knowing whether activity is 
absolutely necessary when trying to build machines that should be able to “see”. From 
the engineer’s point of view the issue whether activity is beneficial is even more important. 


Is Activity Beneficial for Recognition ? 


We have been able to show some clear advantages of active fusion. Our active fusion sys- 
tem is more robust than the corresponding static system, it reaches a higher recognition 
rate through fusion and it still recognizes the objects as quickly as possible by carefully 
choosing the next motion. 


Is this the full answer to the above question ? As could have been expected, each of 
the above advantages does also imply certain disadvantages in some cases. 
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Actions and Motions 


The possibility to move gives the system great freedom in solving problems efficiently. 
On the other hand if problems can be solved without having to move this option should 
certainly be preferred. Motions are costly, both in time and in energy. It is probably 
because of this that biological vision works well without performing motions. Animals 
that need to move to another viewpoint in order to understand what they see are more 
prune to become a good meal for predators. 


Besides the cost for physical actions there are two other cost terms involved in each 
active fusion algorithm: The cost of fusion and the cost of planning. For successful fusion 
the system will in general 


e need to register the information to be fused. 
In order to be able to plan, the system 


e needs additional knowledge about the problem 


— to analyze the current situation and 


— to predict the outcome of possible actions, 
e and it needs time to evaluate various possibilities. 


The registration step can sometimes be avoided which is also demonstrated by our 
active object recognition algorithm. The cost of planning will always arise. Only systems 
performing random motions or a selected choice of actions and static multi-camera systems 
do not have to address any of the above issues”. Let us discuss in more detail the two 
necessary conditions which have to be fulfilled in order that planning can be successful: 


Current State and Predictability 


Every active fusion system needs to analyze the current state as part of its planning step. 
Only if the obtained observations can be related to learning data there is a realistic chance 
to predict the outcome of possible actions. If a system is placed in new environments active 
fusion will usually not be possible by definition. One can, however, conceive a system that 
automatically gathers knowledge in completely new environments such that this knowl- 
edge can later on be used for active fusion tasks (like active object recognition). On the 
other hand, active fusion - as it has been applied until now - assumes the existence of an 
extensive, pre-built knowledge-base. This knowledge-base must contain learning data to 
be used for the recognition process. But it also has to contain knowledge about possible 
outcomes of available actions. Furthermore, the planning engine must possess some kind 
of “rules” that map current states to possible outcomes. After evaluating the “quality” 
of these outcomes the best next action can be chosen?. 


?Perhaps with the exception of how to find out when to terminate. 

3This is the approach taken in this thesis. Alternatively, the rule base may map current states directly 
to the best next action. In this case the assessment of the “quality” of actions has to be performed during 
an extensive learning phase. See also [97, 96]. 
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Another issue, which is necessary for planning to be successful is the predictability 
of the system. The question is whether the learned or calculated expectations for the 
outcomes of certain actions are representative for the outcome of actions under test con- 
ditions. It may well be that an increased level of noise spoils much of the planning phase. 


Within the scope of this work we are mainly interested if robustness can be achieved 
by active fusion. The findings presented in chapter 9 provide strong support that this is 
indeed the case. We have not found any indication of failure for the information fusion 
part of the algorithm if the right fusion operators are chosen. The information fusion 
paradigm provides means to deal robustly with increased outlier rates. In order to sup- 
press outliers a compromising fusion algorithm has to be chosen. Such fusion approaches 
are sub-optimal at low outlier rates. If the exact rate of outliers cannot be predicted be- 
forehand it becomes difficult to adapt the system optimally to the encountered conditions 
but it is always possible to stay on the save side by using averaging fusion operators. 


The performed studies with increasing outlier rate have also shown that planning be- 
comes less effective but does so very gracefully. Only if the outlier rate lies above around 
30% it has become doubtful if the costs involved in planning are still justified. Still, these 
findings also imply that the planning stage clearly needs a predictable environment in 
which it is possible to arrive at reasonable expectations of the outcome of certain actions. 
It can be very reasonable to have a look out of the window before leaving the house in 
order to decide whether you should take your umbrella with you or not. But there is no 
point in deciding today if you should take your umbrella with you at the same day next 
month. If there are too many uncontrollable factors, or the situation cannot be analyzed 
satisfactorily we should not expect action planning to perform well. 


Before continuing, we would like to make clear that sometimes there are other ways 
to deal with outliers in active object recognition than making fusion more robust. For 
instance, one can aim at making the hypothesis generation algorithm more robust by 
using more sophisticated feature extraction modules to avoid segmentation errors and ob- 
ject classifiers which can handle also occlusions for multi-object scenes. Whenever this is 
possible there is certainly no question that it is probably also a very good idea to proceed 
along these lines. However, it must be noted that this will not always be feasible. We 
have listed a number of sources for uncertainty in vision in the introduction to this thesis 
and not all of them can be eliminated easily. After all, that was one of the major reasons 
for the introduction of the active fusion paradigm. 


We think that the above findings are also relevant when pondering on applying ac- 
tive fusion to generic or qualitative object recognition. In generic and qualitative object 
recognition the system can only have vague ideas about the current state (including the 
pose of the object) and about what can be expected when performing certain actions. 
Thus the situation resembles the one we have encountered when increasing the outlier 
rate in chapter 9. It will therefore be very important to chose the right fusion scheme. 
We expect some sort of averaging fusion to be a good candidate for active approaches 
in qualitative object recognition. Furthermore, it may well happen that active planning 
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becomes relatively ineffective. Of course, the approach will get more effective as more 
knowledge has been gathered about the object. Still, it is an open question whether the 
identity of the object will not already be established during the first few steps (which per- 
haps will have to be performed quasi randomly) and if this is not the case (indicating that 
the object is hard to recognize), whether the system will be able to recognize the object 
at all in longer runs (during which, especially at later stages, the action planning module 
will finally be more effective, given that recognition succeeds). The two paradigms are 
somewhat complementary in their requirements which makes research in this direction 
certainly worth pursuing in order to better understand the limitations and capabilities of 
the active fusion framework. 


Robust behavior can also be obtained in a static multi-sensor scenario since robust- 
ness is related mainly to the fuszon of observations and not so much to planned activities. 
The advantage of such an approach is that no time is lost for planning and repositioning 
of the sensing hardware (which makes active fusion unfit for very fast object recognition 
applications). Furthermore, processing can quite often be done in parallel for static multi- 
sensor setups. Using enough cameras it becomes rather unlikely that discriminative views 
are missed (though this obviously depends on the objects to be recognized). The disad- 
vantages of multi-sensor systems for object recognition are their increased costs and the 
underlying assumption of total control over the object recognition environment. It is true 
that this latter assumption has also been one of the premises of our work. But in general, 
active systems have a higher potential to become independent of physical constraints on 
their environment (robots) than static multi sensor applications. 


Equipped with these observations we dare to answer the initial question: is activity 
beneficial for recognition ? We have found compelling evidence, that this can indeed be 
the case. If you are willing (and able) to pay the price for being active you can expect your 
system to become more reliable at reasonable costs. We should, however, not underesti- 
mate the difficulties involved in planning the next action. Neither should we neglect the 
necessary conditions for planning to be successful (i.e. a certain degree of predictability). 
It may well be that these conditions will ultimately define the domain of applications for 
the whole active fusion paradigm. 


In the above analysis we have tried to reach very general conclusions from the limited 
number of results that have been obtained in this thesis. Hence, none of our conclusions 
can be considered to be proven beyond doubt. The limitations of the undertaken study 
suggest some necessary extensions of our approach. 


Limitations of the Current Study and Outlook 


The presented study is clearly demanding for more profound and extensive succeeding 
studies. Only through additional results in other settings it will be possible to establish 
the extent of validity of the drawn conclusions. Fortunately our approach is amenable to 
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various extensions. In particular we consider the following issues to be of importance for 
future research: 


e The presented implementations of the overall algorithm suffer from various short- 
comings. None of the considered implementations takes heed of the temporal and 
spatial constraints existing in active object recognition. Neither the possibilistic 
nor the evidence theoretic implementation are “sophisticated” in the sense that 
they would really exploit all the features of the various uncertainty calculi. It 
would thus be interesting to see how much better one can get if more elaborate 
implementations and fusion modules are established. As stated above, fusion by 
the product rule is best fitted for correct and decisive observations while fusion 
by the sum rule treats outliers best. Hence a combined fusion scheme may be an 
interesting enhancement of the studied approaches. In this context one could study 
fusion schemes that take into account more explicitly the expected observations 
(from the planning module) and fuse observations that fit one of the most likely 
outcomes severely, while observations that seem to contradict the expectations are 
combined using a compromising operator. 


e Planning has been based exclusively on the exhaustive enumeration and testing of 
all possible actions. This approach is not feasible for all possible applications of 
active fusion. We need therefore heuristics (rule-bases, gradient-based optimization 
etc.) to find a good action given a certain state. Machine learning techniques are 
currently applied to this problem but other approaches can be conceived and should 
be compared to the already established results [97]. 


e The considered application domain is quite restricted. We have used only one degree 
of freedom of the active vision setup and a very limited number of objects. It is 
interesting to observe how the algorithm scales to larger problems. 


e The eigenspace representation has dominated this thesis. We have repeatedly re- 
marked that other representations can be chosen as well. Applying the given al- 
gorithm to recognition approaches based on other unary and binary feature spaces 
is therefore an important contribution towards establishing the generality of the 
considered algorithm. 


e The given approach relies solely on fusing the high level object confidences. The 
active fusion loop as depicted in Fig. 1.2 considers also the possibility to apply active 
fusion during the object recognition process itself, e.g. for edge extraction. First 
promising results along this direction have already been achieved but the whole field 
is still quite immature [109]. 


e The study has been limited to single object scenes. The extension of the presented 
scheme to the analysis of multi-object scenes is a possible next step in active fusion 
research. We have mentioned above the possibility to use robust eigenspace methods 
at the object recognition level. These efforts are now close to being mature enough 
to allow for object-pose hypotheses derived from images of multi-object scenes. We 
can then back-project the image regions to 3D space obtaining filled 3D-tubes that 
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indicate the possible 3D locations of the real object. At each active step one can 
infer approximate 3D locations of the objects by calculating the intersections of 
these tubes (for example through multiplication if the “tubes” represent probability 
distributions). One could use an occupancy grid like data-structure for this task but 
it may perhaps be preferable to use parametric models to describe the tubes. The 
full implementation of this idea requires a more sophisticated registration procedure 
in order to know what tubes to intersect. In most cases it will be sufficient to rely on 
the obtained high level hypotheses for registration. Only if the same object occurs 
multiple times in the scene one will have to resort to geometric considerations or 
registration of low level features. First steps towards the registration of low level 
features using a graph based object model have been undertaken in [32, 31]. One 
interesting aspect of this approach is that one could try to establish a 3-D scene 
description even though one is still using only view-based representations for object 
recognition. 


e Finally, it would be interesting to apply the given active fusion algorithm to different 
problems which can be formulated in a similar manner. One such problem is robot 
self localization. There the problem is to establish the position x and pose » of 
the robot. The goal is thus to obtain the quantities c(x, p|i,..J,) where l,.., In 
denote the images (e.g. of the walls) seen by the robot. The whole fusion and 
planning algorithm can follow closely the lines established in chapter 3. Fig. 10.1 
may serve to motivate the approach. Consider a robot in charge of watering plants 
in an office trying to estimate its location. If there are similar plants at different 
positions the location cannot be determined unambiguously by looking just at the 
plant. A good idea for the robot depicted in Fig. 10.1 would be to turn its head right 
to see the Water Lilies of Monet on the other wall*. This feature would allow to 
identify the position unambiguously. The planning algorithm presented in this thesis 
could be used by the robot to find automatically the suggested best action. Again 
active planning makes only sense in case the robot has some knowledge about the 
environment which justifies certain expectations. In less well known environments 
(where reliable expectations are hard to obtain) a lost robot would probably best 
turn around its head in various random directions just like lost people do when they 
have little clue where to look next. 


4We apologize for the bad quality of the reproduction. 
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Figure 10.1: The considered algorithm for planning in active object recognition can also be 
applied to other problems like robot self-localization. 
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What Role will Active Fusion Play in the Future ? 


It goes far beyond the established basis in this thesis to predict the future of the active 
fusion framework. Nevertheless we think that the obtained results provide at least some 
hints when speculating about further applications of the active fusion paradigm. We have 
seen that within constrained application domains there is much room for further develop- 
ments. The approach has proven to be successful in eigenspace based object recognition 
and there is no reason to expect a complete failure of the algorithm in any of the exten- 
sions we have suggested above. Since the active fusion paradigm is actually much more 
general than the presented algorithm we expect active fusion to play an important role 
in many other fields as well. 


However, besides the issue of the involved costs for planning and activity, the universal 
applicability of the active fusion paradigm has not yet been established beyond doubt. The 
findings in this thesis indicate some hard constraints for active planning to be successful, 
such as reliable knowledge about the environment and how it is affected by various actions 
(predictability). It remains to be tested for how many interesting applications it is possible 
to fulfill these constraints or if some important application domains necessarily require 
complementary approaches. In this context, the application of active fusion to generic and 
qualitative object recognition is one of the most interesting issues for future research. If 
the framework should pass also this test successfully that might be regarded to constitute 
a major breakthrough. 
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